How to select specific columns containing certain strings/characters?

Question

I have this dataframe:

df1 <- data.frame(a = c("correct", "wrong", "wrong", "correct"),
  b = c(1, 2, 3, 4),
  c = c("wrong", "wrong", "wrong", "wrong"),
  d = c(2, 2, 3, 4))

a       b c     d
correct 1 wrong 2
wrong   2 wrong 2
wrong   3 wrong 3
correct 4 wrong 4

and would like to select only the columns with either the strings 'correct' or 'wrong' (i.e., columns b and d in df1), such that I get this dataframe:

df2 <- data.frame(a = c("correct", "wrong", "wrong", "correct"),
        c = c("wrong", "wrong", "wrong", "wrong"))

        a     c
1 correct wrong
2   wrong wrong
3   wrong wrong
4 correct wrong

Can I use dplyr to do this? If not, what function(s) can I use to do this? The example I've given is straightforward, in that I can just do this (dplyr):

select(df1, a, c)

However, in my actual dataframe, I have about 700 variables/columns and a few hundred columns that contain the strings 'correct' or 'wrong' and I don't know the variable/column names.

Any suggestions as to how to do this quickly? Thanks a lot!

Colonel Beauvel · Accepted Answer

You can use base R Filter which will operate on each of df1's columns and keep all ones satisfying the logical test in the function:

Filter(function(u) any(c('wrong','correct') %in% u), df1)
#        a     c
#1 correct wrong
#2   wrong wrong
#3   wrong wrong
#4 correct wrong

You can also use grepl:

Filter(function(u) any(grepl('wrong|correct',u)), df1)

How to select specific columns containing certain strings/characters?

Answers (2)

Related Questions