Bahi8482
Bahi8482

Reputation: 538

Apply filter criteria to variables that contain/start with certain string in R

I am trying to find a way to filter a dataframe by criteria applied to variables that their name contains a certain string

in this example below, I want to find the subjects that any of their test results contain "d".

d=structure(list(ID = c("a", "b", "c", "d", "e"), test1 = c("a", "b", "a", "d", "a"), test2 = c("a", "b", "b", "a", "s"), test3 = c("b", "c", "c", "c", "d"), test4 = c("c", "d", "a", "a", "f")), class = "data.frame", row.names = c(NA, -5L))

I can use dplyr and write one by one using | which works for small examples like this but for my real data will be time consuming.

library(dplyr) library(stringr) d %>% filter(str_detect(d$test1, "d") |str_detect(d$test2, "d") |str_detect(d$test3, "d") |str_detect(d$test4, "d") )

the output I get shows that subjects b, d and e meet the criteria:

ID test1 test2 test3 test4 1 b b b c d 2 d d a c a 3 e a s d f

The output is what I need but I was looking for an easier way, for example, if there is a way to apply the filter criteria to the variables that contain the word "test" I know about the contain function in dplyr to select certain variables and I tried it here but not working,

d %>% filter(str_detect(contains("test"), "d"))

is there a way to write this code different or is there another way to achieve the same goal?

thank you

Upvotes: 0

Views: 752

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388907

In base R you can use lapply/sapply :

d[Reduce(`|`, lapply(d[-1], grepl, pattern = 'd')), ]
#d[rowSums(sapply(d[-1], grepl, pattern = 'd')) > 0, ]


#  ID test1 test2 test3 test4
#2  b     b     b     c     d
#4  d     d     a     c     a
#5  e     a     s     d     f

If you are interested in dplyr solution you can use any of the below method :

library(dplyr)
library(stringr)

#1.
d %>% 
  filter_at(vars(starts_with('test')), any_vars(str_detect(., 'd')))

#2.
d %>%
  rowwise() %>%
  filter(any(str_detect(c_across(starts_with('test')), 'd')))

#3.
d %>%
  filter(Reduce(`|`, across(starts_with('test'), str_detect, 'd')))

Upvotes: 2

Related Questions