Adn
Adn

Reputation: 39

subset R data frame using only exact matches of character vector

I would like to subset a data frame (Data) by column names. I have a character vector with column name IDs I want to exclude (IDnames).

What I do normally is something like this:

Data[ ,!colnames(Data) %in% IDnames]

However, I am facing the problem that there is a name "X-360" and another one "X-360.1" in the columns. I only want to exclude the "X-360" (which is also in the character vector), but not "X-360.1" (which is not in the character vector, but extracted anyway). - So I want only exact matches, and it seems like this does not work with %in%.

It seems such a simple problem but I just cannot find a solution...

Update:

Indeed, the problem was that I had duplicated names in my data.frame! It took me a while to figure this out, because when I looked at the subsetted columns with

Data[ ,colnames(Data) %in% IDnames]

it showed "X-360" and "X-360.1" among the names, as stated above. But it seems this was just happening when subsetting the data, before there were just columns with the same name ("X-360") - and that happened because the data frame was set up from matrices with cbind. Here is a demonstration of what happened:

D1 <-matrix(rnorm(36),nrow=6)
colnames(D1) <- c("X-360", "X-400", "X-401", "X-300", "X-302", "X-500")

D2 <-matrix(rnorm(36),nrow=6)
colnames(D2) <- c("X-360", "X-406", "X-403", "X-300", "X-305", "X-501")

D <- cbind(D1, D2)
Data <- as.data.frame(D)

IDnames <- c("X-360", "X-302", "X-501")

Data[ ,colnames(Data) %in% IDnames]
       X-360      X-302    X-360.1      X-501
1 -0.3658194 -1.7046575  2.1009329  0.8167357
2 -2.1987411 -1.3783129  1.5473554 -1.7639961
3  0.5548391  0.4022660 -1.2204003 -1.9454138
4  0.4010191 -2.1751914  0.8479660  0.2800923
5 -0.2790987  0.1859162  0.8349893  0.5285602
6  0.3189967  1.5910424  0.8438429  0.1142751

Learned another thing to be careful about when working with such data in the future...

Upvotes: 2

Views: 401

Answers (1)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522636

One regex based solution here would be to form an alternation of exact keyword matches:

regex <- paste0("^(?:", paste(IDnames, collapse="|"), ")$")
Data[ , !grepl(regex, colnames(Data))]

Upvotes: 1

Related Questions