Reputation: 3011
I am running a script that generates data frames automatically. In certain cases I get columns with a specific pattern such as d123
or d3452
. It is basically the character d followed by few digits. The number of digits could be just one or sometimes it goes up to a length of 4 characters. I want to delete all the columns that match this particular pattern. The example data frame is as follows:
df <- data.frame(d1234=c(1,2,3), b=c(3,4,5),c=c(4,5,3), d3245=c(3,2,4))
The df looks like this:
d1234 b c d3245
1 1 3 4 3
2 2 4 5 2
3 3 5 3 4
From this I want to delete only the first and the last column that matches the pattern. I have tried the following:
df <- data.frame(d1234=c(1,2,3), b=c(3,4,5),c=c(4,5,3), d3245=c(3,2,4))
colpat <- "[d[:digit:]]"
if (colpat %in% names(df)) {
d <- df[,!names(df) == colpat]
} else {
d <- df
}
print(d)
But still the columns remain
Upvotes: 1
Views: 953
Reputation: 5138
For a tidyverse solution, you could use regex in the matches
helper when selecting columns.
df %>%
select(-matches("d\\d+$"))
b c
1 3 4
2 4 5
3 5 3
Upvotes: 4
Reputation: 887108
We can use grep
for regex matching of a pattern in the column name. Here, the pattern is to check for letter 'd' at the start (^
) of the string followed by one or more digits (\\d+
) till the end ($
) of the string, use the invert = TRUE
(by default it is FALSE), and subset the columns with the numeric index
df[grep("^d\\d+$", names(df), invert = TRUE)]
# b c
#1 3 4
#2 4 5
#3 5 3
Upvotes: 3