arranjdavis
arranjdavis

Reputation: 735

How to select columns in an R dataframe based on string matching

I don't think this exact question has been asked yet (for R, anyway).

I want to retain any columns in my dataset (there are hundreds in actuality) that contain a certain string, and drop the rest. I have found plenty of examples of string searching column names, but nothing for the contents of the columns themselves.

As an example, say I have this dataset:

df = data.frame(v1 = c(1, 8, 7, 'No number'),
                v2 = c(5, 3, 5, 1),
                v3 = c('Nothing', 4, 2, 9),
                v4 = c(3, 8, 'Something', 6))

For this example, say I want to retain any columns with the string No, so that the resulting dataset is:

         v1      v3
1         1 Nothing
2         8       4
3         7       2
4 No number       9

How can I do this in R? I am happy with any sort of solution (e.g., base R, dplyr, etc.)!

Thanks in advance!

Upvotes: 3

Views: 1734

Answers (4)

jay.sf
jay.sf

Reputation: 72593

Simply

df[grep("No", df)]
#          v1      v3
# 1         1 Nothing
# 2         8       4
# 3         7       2
# 4 No number       9

This works, because grep internally checks if if (!is.character(x)) and if that's true it basically does:

s <- structure(as.character(df), names = names(df))
s
# v1 
# "c(\"1\", \"8\", \"7\", \"No number\")" 
# v2 
# "c(5, 3, 5, 1)" 
# v3 
# "c(\"Nothing\", \"4\", \"2\", \"9\")" 
# v4 
# "c(\"3\", \"8\", \"Something\", \"6\")" 
grep("No", s)
# [1] 1 3

Note:

R.version.string
# [1] "R version 4.0.3 (2020-10-10)"

Upvotes: 4

Roman Luštrik
Roman Luštrik

Reputation: 70623

You can run grepl for each column and if there's any value in there, pick it.

df = data.frame(v1 = c(1, 8, 7, 'No number'),
                v2 = c(5, 3, 5, 1),
                v3 = c('Nothing', 4, 2, 9),
                v4 = c(3, 8, 'Something', 6))

find.no <- sapply(X = df, FUN = function(x) {
  any(grep("No", x = x))
})

> df[, find.no]
         v1      v3
1         1 Nothing
2         8       4
3         7       2
4 No number       9

Upvotes: 1

xwhitelight
xwhitelight

Reputation: 1579

Use dplyr::select_if() function:

df <- df %>% select_if(function(col) any(grepl("No", col)))

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388797

Base R :

df[colSums(sapply(df, grepl, pattern = 'No')) > 0]

#         v1      v3
#1         1 Nothing
#2         8       4
#3         7       2
#4 No number       9

Using dplyr :

library(dplyr)
df %>% select(where(~any(grepl('No', .))))

Upvotes: 4

Related Questions