dbertolatus
dbertolatus

Reputation: 73

Determine which column name is causing 'undefined columns selected' error when using subset()

I'm trying to subset a large data frame from a very large data frame, using

data.new <- subset(data, select = vector)

where vector is a character string containing the column names I'm trying to isolate. When I do this I get

Error in `[.data.frame`(x, r, vars, drop = drop) : 
  undefined columns selected

Is there a way to identify which specific column name in the vector is undefined? Through trial and error I've narrowed it down to about 400, but that still doesn't help.

Upvotes: 7

Views: 7462

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226057

Find the elements of your vector that are not %in% the names() of your data frame.

Working example:

dd <- data.frame(a=1,b=2)
subset(dd,select=c("a"))
##   a
## 1 1

Now try something that doesn't work:

v <- c("a","d")
subset(dd,select=v)
## Error in `[.data.frame`(x, r, vars, drop = drop) : 
##    undefined columns selected

v[!v %in% names(dd)]
## [1] "d"

Or

setdiff(v,names(dd))
## [1] "d"

The last few lines of the example code in ?match show a similar case.

Upvotes: 8

Related Questions