Reputation: 39
I would like to subset a data frame (Data) by column names. I have a character vector with column name IDs I want to exclude (IDnames).
What I do normally is something like this:
Data[ ,!colnames(Data) %in% IDnames]
However, I am facing the problem that there is a name "X-360" and another one "X-360.1" in the columns. I only want to exclude the "X-360" (which is also in the character vector), but not "X-360.1" (which is not in the character vector, but extracted anyway). - So I want only exact matches, and it seems like this does not work with %in%.
It seems such a simple problem but I just cannot find a solution...
Update:
Indeed, the problem was that I had duplicated names in my data.frame! It took me a while to figure this out, because when I looked at the subsetted columns with
Data[ ,colnames(Data) %in% IDnames]
it showed "X-360" and "X-360.1" among the names, as stated above. But it seems this was just happening when subsetting the data, before there were just columns with the same name ("X-360") - and that happened because the data frame was set up from matrices with cbind. Here is a demonstration of what happened:
D1 <-matrix(rnorm(36),nrow=6)
colnames(D1) <- c("X-360", "X-400", "X-401", "X-300", "X-302", "X-500")
D2 <-matrix(rnorm(36),nrow=6)
colnames(D2) <- c("X-360", "X-406", "X-403", "X-300", "X-305", "X-501")
D <- cbind(D1, D2)
Data <- as.data.frame(D)
IDnames <- c("X-360", "X-302", "X-501")
Data[ ,colnames(Data) %in% IDnames]
X-360 X-302 X-360.1 X-501
1 -0.3658194 -1.7046575 2.1009329 0.8167357
2 -2.1987411 -1.3783129 1.5473554 -1.7639961
3 0.5548391 0.4022660 -1.2204003 -1.9454138
4 0.4010191 -2.1751914 0.8479660 0.2800923
5 -0.2790987 0.1859162 0.8349893 0.5285602
6 0.3189967 1.5910424 0.8438429 0.1142751
Learned another thing to be careful about when working with such data in the future...
Upvotes: 2
Views: 401
Reputation: 522636
One regex based solution here would be to form an alternation of exact keyword matches:
regex <- paste0("^(?:", paste(IDnames, collapse="|"), ")$")
Data[ , !grepl(regex, colnames(Data))]
Upvotes: 1