Reputation: 179
I have the following data:
data <- data.frame(name1 =c("John Smith", "A A", "A B", "A C", "John Donovan", "A F", "A D", "A F", "A D", "A C"),
name2 = c("A V", "John Smith", "A D", "A R", "A O", "John Smith", "A M", "A V", "A L", "A Q"),
name3 = c("John Donovan", "A R", "John Donovan", "A L", "A V", "A Q", "A T", "A N", "A L", "A X"))
> data
name1 name2 name3
1 John Smith A V John Donovan
2 A A John Smith A R
3 A B A D John Donovan
4 A C A R A L
5 John Donovan A O A V
6 A F John Smith A Q
7 A D A M A T
8 A F A V A N
9 A D A L A L
10 A C A Q A X
I want to filter all rows that contain the string "John" across all columns (the number of columns can be higher than 3).
I try the following
data %>%
filter(across(everything()), !str.detect("John"))
However it does not work and it produces:
Error: Problem with `filter()` input `..1`.
x Input `..1$name1` must be a logical vector, not a character.
i Input `..1` is `across(everything())`.
Do you have any idea, especially one using the across function.
Upvotes: 3
Views: 5583
Reputation: 887891
We can also do this in a vectorized way
data[!rowSums(`dim<-`(grepl('John', as.matrix(data)), dim(data))),]
-output
# name1 name2 name3
#4 A C A R A L
#7 A D A M A T
#8 A F A V A N
#9 A D A L A L
#10 A C A Q A X
Upvotes: 1
Reputation: 3888
filter takes a logical vector, thus when using across you need to pass the function to the across call as to apply that function on all the selected columns:
df %>% filter(across(everything(), ~ !str_detect(., "John")))
V1 V2 V3
1 A C A R A L
2 A D A M A T
3 A F A V A N
4 A D A L A L
5 A C A Q A X
using the solution proposed in @ekoam's comment:
df %>% filter(rowSums(across(everything(), ~ str_detect(., "John"))) > 0)
V1 V2 V3
1 John Smith A V John Donovan
2 A A John Smith A R
3 A B A D John Donovan
4 John Donovan A O A V
5 A F John Smith A Q
Just to make the picture a bit clearer :
df %>% filter(print(across(everything(), ~ !str_detect(., "John"))))
# A tibble: 10 x 3
V1 V2 V3
<lgl> <lgl> <lgl>
1 FALSE TRUE FALSE
2 TRUE FALSE TRUE
3 TRUE TRUE FALSE
4 TRUE TRUE TRUE
5 FALSE TRUE TRUE
6 TRUE FALSE TRUE
7 TRUE TRUE TRUE
8 TRUE TRUE TRUE
9 TRUE TRUE TRUE
10 TRUE TRUE TRUE
V1 V2 V3
1 A C A R A L
2 A D A M A T
3 A F A V A N
4 A D A L A L
5 A C A Q A X
Notice that filter is &
(and)ing the booleans by row i.e only rows with all TRUE
value will be selected, those who have at least one FALSE
will not. Now let's take a look at the code you provided in your comment:
df %>% filter(print(across(everything(), ~ str_detect(., "John"))))
# A tibble: 10 x 3
V1 V2 V3
<lgl> <lgl> <lgl>
1 TRUE FALSE TRUE
2 FALSE TRUE FALSE
3 FALSE FALSE TRUE
4 FALSE FALSE FALSE
5 TRUE FALSE FALSE
6 FALSE TRUE FALSE
7 FALSE FALSE FALSE
8 FALSE FALSE FALSE
9 FALSE FALSE FALSE
10 FALSE FALSE FALSE
[1] V1 V2 V3
<0 rows> (or 0-length row.names)
All the rows have at least one FALSE
, thus no rows are selected.
Upvotes: 4
Reputation: 389275
Here are couple of base R approaches.
Using sapply
:
df[rowSums(sapply(df, grepl, pattern = 'John')) == 0, ]
# name1 name2 name3
#4 A C A R A L
#7 A D A M A T
#8 A F A V A N
#9 A D A L A L
#10 A C A Q A X
With lapply
:
df[!Reduce(`|`, lapply(df, grepl, pattern = 'John')), ]
Upvotes: 1