Filter strings across multiple columns with data.table

Question

I have a dataset that looks something like this.

df <- tibble::tribble(
  ~name,           ~x,  ~y,              ~z,  
  "N/A",            1,   "testSmith",    -100, 
  "N A",            3,   "NOt available", -99,
  "test Smith",     NA,  "test Smith",    -98,
  "Not Available", -99, "25",             -101,
  "test Smith",    -98, "28",             -1)

I would like to create a new data.table that keeps all the rows the string "test".

The final dataset should look something like this

  name           x y              z
               
1 N/A            1 testSmith   -100
2 test Smith    NA test Smith   -98
3 test Smith   -98 28            -1

I could do this column by column like this

setDT(df)[name%like%"test"|y%like%"test"]

The problem with this approach is that I have hundreds of string variables and I would like to find a more compact approach. I tried the followings but they do not work

chvar <- keep(trai,is.character)%>%names()
setDT(df)[chvar%like%"test"]#error
setDT(df)[(chvar)%like%"test"]#error
setDT(df)[.(chvar)%like%"test"]#empty dt

Does someone know how I could do it in a quick and efficient way?

Thanks a lot for your help

Ronak Shah · Accepted Answer

In data.table you can do :

library(data.table)

cols <- c('name', 'y')
setDT(df)

df[df[, Reduce(`|`, lapply(.SD, `%like%`, "test")), .SDcols = cols]]

#         name   x          y    z
#1:        N/A   1  testSmith -100
#2: test Smith  NA test Smith  -98
#3: test Smith -98         28   -1

In base R :

subset(df, Reduce(`|`, lapply(df[cols], function(x) grepl('test', x))))

dplyr :

library(dplyr)
df %>% filter(Reduce(`|`, across(all_of(cols), ~grepl('test', .x))))

lapply/across returns a list of TRUE/FALSE values for all columns. It will return TRUE if 'test' is present and FALSE if it is not present. When we use it in combination with Reduce and | it will give TRUE only of there is atleast one TRUE value in the row. If all the values in the row are FALSE it will returns FALSE. We select only those rows which has at least one TRUE value in it.

Filter strings across multiple columns with data.table

Answers (2)

Related Questions