Shinobi_Atobe
Shinobi_Atobe

Reputation: 1973

dplyr: select all variables except for those contained in vector

This should be a simple issue but I am struggling.

I have a vector of variable names that I want to exclude from a data frame:

df <- data.frame(matrix(rexp(50), nrow = 10, ncol = 5))
names(df) <- paste0(rep("variable_", 5), 1:5)

excluded_vars <- c("variable_1", "variable_3")

I would have thought that just excluding the object in the select statement with - would have worked:

select(df, -excluded_vars)

But I get the following error:

Error in -excluded_vars : invalid argument to unary operator

the same is true when using select_()

Any ideas?

Upvotes: 32

Views: 80770

Answers (9)

vpz
vpz

Reputation: 1044

You are almost there just use -c() in the excluded_vars.
Like this:

select(df, -c(excluded_vars))

Upvotes: 9

TestingStuffs
TestingStuffs

Reputation: 11

How about this? There is a need to pre-build the column list vector and you'll have to rename the column aligned to its actual order, but it might work?


cc1 <- c("id")
nm <- names(df)
cc2 <- setdiff(nm, cc1)

select(df, .cols=c(everything(), -cc1)) %>% rename_with(~ cc2)

Upvotes: 1

rodavok
rodavok

Reputation: 21

The select(... -one_of()) method was giving me an error

(unused argument (-one_of(excluded_vars))

df[, -which(names(df) %in% excluded_vars)] worked for me instead (R 4.0.3)

Upvotes: 0

Arthur Yip
Arthur Yip

Reputation: 6220

select(df, -any_of(excluded_vars)) is now the safest way to do this (the code will not break if a variable name that doesn't exist in df is included in excluded_vars)

Upvotes: 15

ahmadzai
ahmadzai

Reputation: 44

Just simply use the the negation operator as: select(df, !c(col1, col2, col3))

Upvotes: -1

Gurgen Hovakimyan
Gurgen Hovakimyan

Reputation: 9

You can write:

df %>% dplyr::select(colname)

Some packages also have select function and this may be the problem, that's why you need to mention package.

Upvotes: 0

Shinobi_Atobe
Shinobi_Atobe

Reputation: 1973

As of a more recent version of dplyr, the following now works:

select(df, -excluded_vars)

Upvotes: 13

C. Braun
C. Braun

Reputation: 5191

You need to use the one_of function:

select(df, -one_of(excluded_vars))

See the section on Useful Functions in the dplyr documentation for select for more about selecting based on variable names.

Upvotes: 32

erocoar
erocoar

Reputation: 5893

With select_, you could simply use setdiff.

select_(df, .dots = setdiff(colnames(df), excluded_vars))

Upvotes: 1

Related Questions