Reputation: 1353
I have the following code and would like to select columns into a new data.frame
.
library(dplyr)
df = data.frame(
Manhattan=c(1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0),
Brooklyn=c(0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0),
The_Bronx=c(1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0),
Staten_Island=c(0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0),
"2012"=c("P", "P", "P", "P", "P", "P", "P", "P", "P", "P", "Q", "Q", "Q", "Q", "Q", "Q", "Q", "Q", "Q"),
"2013"=c("P", "P", "P", "P", "P", "P", "P", "P", "Q", "Q", "P", "P", "P", "P", "Q", "Q", "Q", "Q", "Q"),
"2014"=c("P", "P", "P", "Q", "Q", "P", "P", "Q", "Q", "Q", "Q", "Q", "P", "Q", "P", "P", "P", "Q", "Q"),
"2015"=c("P", "P", "P", "P", "P", "Q", "Q", "Q", "P", "Q", "P", "P", "Q", "Q", "Q", "Q", "Q", "Q", "Q"), check.names=FALSE)
df2 <- subset(df, select = c("Manhattan", "Queens", "The_Bronx"))
This throws the error:
Error in [.data.frame`(x, r, vars, drop = drop) :
undefined columns selected
Because the column "Queens" is missing from df
. How can I can override the error, so that R proceeds to create df2 with columns "Manhattan" and "The_Bronx" only?
Very important: My real data have hundreds of columns, so it is not doable to manually remove columns like "Queens" from the command df2 <- subset(df, select = c("Manhattan", "Queens", "The_Bronx"))
(unless there is a trick for that?). Is there a way to solve this? Thank you.
Upvotes: 0
Views: 950
Reputation: 1
The current version of dplyr supports passing a character vector of variable names into the second argument to dplyr::select()
, but recommends wrapping that vector in all_of()
to reduce ambiguity.
varnames <- c("mpg", "cyl", "carb")
the following two lines both produce the the same output:
dplyr::select(mtcars, varnames)
dplyr::select(mtcars, all_of(varnames))
output:
mpg cyl carb
Mazda RX4 21 6 4
Mazda RX4 Wag 21 6 4
Datsun 710 23 4 1
Hornet 4 Drive 21 6 1
Upvotes: -1
Reputation: 886938
We could also do
cols <- c("Manhattan", "Queens", "The_Bronx")
library(dplyr)
df %>%
select(matches(str_c(cols, collapse="|")))
Upvotes: 1
Reputation: 388807
In base R, you can use intersect
to select only the names which are present.
cols <- c("Manhattan", "Queens", "The_Bronx")
subset(df, select = intersect(names(df), cols))
# Manhattan The_Bronx
#1 1 1
#2 1 1
#3 0 0
#4 1 0
#5 1 0
#6 1 0
#7 1 0
#8 0 0
#...
#....
Or use any_of
in dplyr
:
library(dplyr)
df %>% select(tidyselect::any_of(cols))
Upvotes: 5