daniell
daniell

Reputation: 179

Filter rows that contain a certain string across all columns (with dplyr)

I have the following data:

data <- data.frame(name1 =c("John Smith", "A A", "A B", "A C", "John Donovan", "A F", "A D", "A F", "A D", "A C"),
      name2 = c("A V", "John Smith", "A D", "A R", "A O", "John Smith", "A M", "A V", "A L", "A Q"),
      name3 = c("John Donovan", "A R", "John Donovan", "A L", "A V", "A Q", "A T", "A N", "A L", "A X"))
> data
          name1      name2        name3
1    John Smith        A V John Donovan
2           A A John Smith          A R
3           A B        A D John Donovan
4           A C        A R          A L
5  John Donovan        A O          A V
6           A F John Smith          A Q
7           A D        A M          A T
8           A F        A V          A N
9           A D        A L          A L
10          A C        A Q          A X

I want to filter all rows that contain the string "John" across all columns (the number of columns can be higher than 3).

I try the following

data %>%
  filter(across(everything()), !str.detect("John"))

However it does not work and it produces:

Error: Problem with `filter()` input `..1`.
x Input `..1$name1` must be a logical vector, not a character.
i Input `..1` is `across(everything())`.

Do you have any idea, especially one using the across function.

Upvotes: 3

Views: 5583

Answers (3)

akrun
akrun

Reputation: 887891

We can also do this in a vectorized way

data[!rowSums(`dim<-`(grepl('John', as.matrix(data)), dim(data))),]

-output

#     name1 name2 name3
#4    A C   A R   A L
#7    A D   A M   A T
#8    A F   A V   A N
#9    A D   A L   A L
#10   A C   A Q   A X

Upvotes: 1

Abdessabour Mtk
Abdessabour Mtk

Reputation: 3888

filter takes a logical vector, thus when using across you need to pass the function to the across call as to apply that function on all the selected columns:

df %>% filter(across(everything(), ~ !str_detect(., "John")))
   V1  V2  V3
1 A C A R A L
2 A D A M A T
3 A F A V A N
4 A D A L A L
5 A C A Q A X

using the solution proposed in @ekoam's comment:

df %>% filter(rowSums(across(everything(), ~ str_detect(., "John"))) > 0)
            V1         V2           V3
1   John Smith        A V John Donovan
2          A A John Smith          A R
3          A B        A D John Donovan
4 John Donovan        A O          A V
5          A F John Smith          A Q

Just to make the picture a bit clearer :

df %>% filter(print(across(everything(), ~ !str_detect(., "John"))))
# A tibble: 10 x 3
   V1    V2    V3   
   <lgl> <lgl> <lgl>
 1 FALSE TRUE  FALSE
 2 TRUE  FALSE TRUE 
 3 TRUE  TRUE  FALSE
 4 TRUE  TRUE  TRUE 
 5 FALSE TRUE  TRUE 
 6 TRUE  FALSE TRUE 
 7 TRUE  TRUE  TRUE 
 8 TRUE  TRUE  TRUE 
 9 TRUE  TRUE  TRUE 
10 TRUE  TRUE  TRUE 
   V1  V2  V3
1 A C A R A L
2 A D A M A T
3 A F A V A N
4 A D A L A L
5 A C A Q A X

Notice that filter is &(and)ing the booleans by row i.e only rows with all TRUE value will be selected, those who have at least one FALSE will not. Now let's take a look at the code you provided in your comment:

 df %>% filter(print(across(everything(), ~ str_detect(., "John"))))
# A tibble: 10 x 3
   V1    V2    V3   
   <lgl> <lgl> <lgl>
 1 TRUE  FALSE TRUE 
 2 FALSE TRUE  FALSE
 3 FALSE FALSE TRUE 
 4 FALSE FALSE FALSE
 5 TRUE  FALSE FALSE
 6 FALSE TRUE  FALSE
 7 FALSE FALSE FALSE
 8 FALSE FALSE FALSE
 9 FALSE FALSE FALSE
10 FALSE FALSE FALSE
[1] V1 V2 V3
<0 rows> (or 0-length row.names)

All the rows have at least one FALSE, thus no rows are selected.

Upvotes: 4

Ronak Shah
Ronak Shah

Reputation: 389275

Here are couple of base R approaches.

Using sapply :

df[rowSums(sapply(df, grepl, pattern = 'John')) == 0, ]

#   name1 name2 name3
#4    A C   A R   A L
#7    A D   A M   A T
#8    A F   A V   A N
#9    A D   A L   A L
#10   A C   A Q   A X

With lapply :

df[!Reduce(`|`, lapply(df, grepl, pattern = 'John')), ]

Upvotes: 1

Related Questions