Reputation: 1304
I have a dataset that looks something like this.
df <- tibble::tribble(
~name, ~x, ~y, ~z,
"N/A", 1, "testSmith", -100,
"N A", 3, "NOt available", -99,
"test Smith", NA, "test Smith", -98,
"Not Available", -99, "25", -101,
"test Smith", -98, "28", -1)
I would like to create a new data.table that keeps all the rows the string "test".
The final dataset should look something like this
name x y z
<chr> <dbl> <chr> <dbl>
1 N/A 1 testSmith -100
2 test Smith NA test Smith -98
3 test Smith -98 28 -1
I could do this column by column like this
setDT(df)[name%like%"test"|y%like%"test"]
The problem with this approach is that I have hundreds of string variables and I would like to find a more compact approach. I tried the followings but they do not work
chvar <- keep(trai,is.character)%>%names()
setDT(df)[chvar%like%"test"]#error
setDT(df)[(chvar)%like%"test"]#error
setDT(df)[.(chvar)%like%"test"]#empty dt
Does someone know how I could do it in a quick and efficient way?
Thanks a lot for your help
Upvotes: 3
Views: 1017
Reputation: 102710
Another data.table
option is using grepl
dt[rowSums(dt[, lapply(.SD, grepl, pattern = "test")]) > 0, .SDcols = .(name, y)]
such that
name x y z
1: N/A 1 testSmith -100
2: test Smith NA test Smith -98
3: test Smith -98 28 -1
Upvotes: 1
Reputation: 389275
In data.table
you can do :
library(data.table)
cols <- c('name', 'y')
setDT(df)
df[df[, Reduce(`|`, lapply(.SD, `%like%`, "test")), .SDcols = cols]]
# name x y z
#1: N/A 1 testSmith -100
#2: test Smith NA test Smith -98
#3: test Smith -98 28 -1
In base R :
subset(df, Reduce(`|`, lapply(df[cols], function(x) grepl('test', x))))
dplyr
:
library(dplyr)
df %>% filter(Reduce(`|`, across(all_of(cols), ~grepl('test', .x))))
lapply
/across
returns a list of TRUE
/FALSE
values for all columns. It will return TRUE
if 'test'
is present and FALSE
if it is not present. When we use it in combination with Reduce
and |
it will give TRUE
only of there is atleast one TRUE
value in the row. If all the values in the row are FALSE
it will returns FALSE
. We select only those rows which has at least one TRUE
value in it.
Upvotes: 5