Sylvia Rodriguez
Sylvia Rodriguez

Reputation: 1353

R dplyr subset with missing columns

I have the following code and would like to select columns into a new data.frame.

library(dplyr)
df = data.frame(
    Manhattan=c(1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0), 
    Brooklyn=c(0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0), 
    The_Bronx=c(1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0), 
    Staten_Island=c(0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0), 
    "2012"=c("P", "P", "P", "P", "P", "P", "P", "P", "P", "P", "Q", "Q", "Q", "Q", "Q", "Q", "Q", "Q", "Q"), 
    "2013"=c("P", "P", "P", "P", "P", "P", "P", "P", "Q", "Q", "P", "P", "P", "P", "Q", "Q", "Q", "Q", "Q"), 
    "2014"=c("P", "P", "P", "Q", "Q", "P", "P", "Q", "Q", "Q", "Q", "Q", "P", "Q", "P", "P", "P", "Q", "Q"), 
    "2015"=c("P", "P", "P", "P", "P", "Q", "Q", "Q", "P", "Q", "P", "P", "Q", "Q", "Q", "Q", "Q", "Q", "Q"), check.names=FALSE)
df2 <- subset(df, select = c("Manhattan", "Queens", "The_Bronx"))

This throws the error:

Error in [.data.frame`(x, r, vars, drop = drop) : 
   undefined columns selected

Because the column "Queens" is missing from df. How can I can override the error, so that R proceeds to create df2 with columns "Manhattan" and "The_Bronx" only?

Very important: My real data have hundreds of columns, so it is not doable to manually remove columns like "Queens" from the command df2 <- subset(df, select = c("Manhattan", "Queens", "The_Bronx")) (unless there is a trick for that?). Is there a way to solve this? Thank you.

Upvotes: 0

Views: 950

Answers (3)

jmlpgh
jmlpgh

Reputation: 1

The current version of dplyr supports passing a character vector of variable names into the second argument to dplyr::select(), but recommends wrapping that vector in all_of() to reduce ambiguity.

varnames <- c("mpg", "cyl", "carb")

the following two lines both produce the the same output:

dplyr::select(mtcars, varnames)
dplyr::select(mtcars, all_of(varnames))

output:

                     mpg cyl carb
 Mazda RX4            21   6    4
 Mazda RX4 Wag        21   6    4
 Datsun 710           23   4    1
 Hornet 4 Drive       21   6    1

Upvotes: -1

akrun
akrun

Reputation: 886938

We could also do

cols <- c("Manhattan", "Queens", "The_Bronx")
library(dplyr)
df %>%
   select(matches(str_c(cols, collapse="|")))

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388807

In base R, you can use intersect to select only the names which are present.

cols <- c("Manhattan", "Queens", "The_Bronx")
subset(df, select = intersect(names(df), cols))

#   Manhattan The_Bronx
#1          1         1
#2          1         1
#3          0         0
#4          1         0
#5          1         0
#6          1         0
#7          1         0
#8          0         0
#...
#....

Or use any_of in dplyr :

library(dplyr)
df %>% select(tidyselect::any_of(cols))

Upvotes: 5

Related Questions