Apricot
Apricot

Reputation: 3011

R delete columns from data frame matching regex pattern

I am running a script that generates data frames automatically. In certain cases I get columns with a specific pattern such as d123 or d3452. It is basically the character d followed by few digits. The number of digits could be just one or sometimes it goes up to a length of 4 characters. I want to delete all the columns that match this particular pattern. The example data frame is as follows:

df <- data.frame(d1234=c(1,2,3), b=c(3,4,5),c=c(4,5,3), d3245=c(3,2,4))

The df looks like this:

  d1234 b c d3245
1     1 3 4     3
2     2 4 5     2
3     3 5 3     4

From this I want to delete only the first and the last column that matches the pattern. I have tried the following:

df <- data.frame(d1234=c(1,2,3), b=c(3,4,5),c=c(4,5,3), d3245=c(3,2,4))
  colpat <- "[d[:digit:]]"
  if (colpat %in% names(df)) {
    d <- df[,!names(df) == colpat]  
  } else {
    d <- df
  }
  print(d)

But still the columns remain

Upvotes: 1

Views: 953

Answers (2)

Andrew
Andrew

Reputation: 5138

For a tidyverse solution, you could use regex in the matches helper when selecting columns.

df %>%
  select(-matches("d\\d+$"))

  b c
1 3 4
2 4 5
3 5 3

Upvotes: 4

akrun
akrun

Reputation: 887108

We can use grep for regex matching of a pattern in the column name. Here, the pattern is to check for letter 'd' at the start (^) of the string followed by one or more digits (\\d+) till the end ($) of the string, use the invert = TRUE (by default it is FALSE), and subset the columns with the numeric index

df[grep("^d\\d+$", names(df), invert = TRUE)]
#  b c
#1 3 4
#2 4 5
#3 5 3

Upvotes: 3

Related Questions