Reputation: 735
I don't think this exact question has been asked yet (for R, anyway).
I want to retain any columns in my dataset (there are hundreds in actuality) that contain a certain string, and drop the rest. I have found plenty of examples of string searching column names, but nothing for the contents of the columns themselves.
As an example, say I have this dataset:
df = data.frame(v1 = c(1, 8, 7, 'No number'),
v2 = c(5, 3, 5, 1),
v3 = c('Nothing', 4, 2, 9),
v4 = c(3, 8, 'Something', 6))
For this example, say I want to retain any columns with the string No
, so that the resulting dataset is:
v1 v3
1 1 Nothing
2 8 4
3 7 2
4 No number 9
How can I do this in R? I am happy with any sort of solution (e.g., base R, dplyr
, etc.)!
Thanks in advance!
Upvotes: 3
Views: 1734
Reputation: 72593
Simply
df[grep("No", df)]
# v1 v3
# 1 1 Nothing
# 2 8 4
# 3 7 2
# 4 No number 9
This works, because grep
internally checks if if (!is.character(x))
and if that's true it basically does:
s <- structure(as.character(df), names = names(df))
s
# v1
# "c(\"1\", \"8\", \"7\", \"No number\")"
# v2
# "c(5, 3, 5, 1)"
# v3
# "c(\"Nothing\", \"4\", \"2\", \"9\")"
# v4
# "c(\"3\", \"8\", \"Something\", \"6\")"
grep("No", s)
# [1] 1 3
Note:
R.version.string
# [1] "R version 4.0.3 (2020-10-10)"
Upvotes: 4
Reputation: 70623
You can run grepl
for each column and if there's any value in there, pick it.
df = data.frame(v1 = c(1, 8, 7, 'No number'),
v2 = c(5, 3, 5, 1),
v3 = c('Nothing', 4, 2, 9),
v4 = c(3, 8, 'Something', 6))
find.no <- sapply(X = df, FUN = function(x) {
any(grep("No", x = x))
})
> df[, find.no]
v1 v3
1 1 Nothing
2 8 4
3 7 2
4 No number 9
Upvotes: 1
Reputation: 1579
Use dplyr::select_if()
function:
df <- df %>% select_if(function(col) any(grepl("No", col)))
Upvotes: 2
Reputation: 388797
Base R :
df[colSums(sapply(df, grepl, pattern = 'No')) > 0]
# v1 v3
#1 1 Nothing
#2 8 4
#3 7 2
#4 No number 9
Using dplyr
:
library(dplyr)
df %>% select(where(~any(grepl('No', .))))
Upvotes: 4