Aytan
Aytan

Reputation: 136

filter row when multiple columns can be concerned

I have this data:

# A tibble: 20 x 6
      ID style param1 param2 param3 param4
   <dbl> <chr> <chr>  <chr>  <chr>  <chr> 
 1     1 ar    R78    NA     NA     NA    
 2     2 bg    NA     NA     NA     NA    
 3     3 bh    NA     NA     NA     NA    
 4     4 ar    NA     R78    NA     NA    
 5     5 bg    NA     NA     NA     NA    
 6     6 bh    NA     NA     NA     NA    
 7     7 ar    R78    NA     NA     NA    
 8     8 bg    NA     NA     R78    NA    
 9     9 bh    NA     NA     NA     NA    
10    10 ar    NA     R78    NA     NA    
11    11 bg    NA     NA     NA     NA    
12    12 bh    NA     NA     R78    NA    
13    13 ar    NA     NA     NA     NA    
14    14 bg    R78    NA     NA     NA    
15    15 bh    NA     NA     NA     NA    
16    16 ar    NA     NA     NA     NA    
17    17 bg    NA     NA     NA     NA    
18    18 bh    R78    NA     NA     NA    
19    19 ar    NA     NA     NA     R78   
20    20 bg    NA     NA     NA     NA 

I want to use dplyr::filter to select rows when R78 is in the column param1, param2, param3 or param4

I try:

data %>%
  filter(across(param1:param4) == "R78")

which returns me:

# A tibble: 4 x 6
     ID style param1 param2 param3 param4
  <dbl> <chr> <chr>  <chr>  <chr>  <chr> 
1     1 ar    R78    NA     NA     NA    
2     7 ar    R78    NA     NA     NA    
3    14 bg    R78    NA     NA     NA    
4    18 bh    R78    NA     NA     NA  

This is the same as when i do data %>% filter(param1 == "R78")

...

Maybe i misuse the "across" function. I've tried with multiples "|" but never work :/

What i expect to my code is it must return me a tibble with the row 1, 4, 7, 10, 12, 14; 18 and 19 only :/

Thnaks to you !

Upvotes: 2

Views: 67

Answers (3)

Ric S
Ric S

Reputation: 9247

Just my two cents here: this is another possible solution that follows the indications on dplyr's documentation when talking about the filter function. It says:

Previously, filter() was paired with the all_vars() and any_vars() helpers. Now, across() is equivalent to all_vars(), and there’s no direct replacement for any_vars(). However you can make a simple helper yourself.

Here I create the helper function rowAny which returns a logical vector according to which the condition x == "R78" is satisfied or not, then I apply it along the variables specified by across.

rowAny <- function(x) {rowSums(x == "R78", na.rm = TRUE) > 0}
df %>% filter(rowAny(across(param1:param4)))

# A tibble: 9 x 6
#      ID style param1 param2 param3 param4
#   <int> <chr> <chr>  <chr>  <chr>  <chr> 
# 1     1 ar    R78    NA     NA     NA    
# 2     4 ar    NA     R78    NA     NA    
# 3     7 ar    R78    NA     NA     NA    
# 4     8 bg    NA     NA     R78    NA    
# 5    10 ar    NA     R78    NA     NA    
# 6    12 bh    NA     NA     R78    NA    
# 7    14 bg    R78    NA     NA     NA    
# 8    18 bh    R78    NA     NA     NA    
# 9    19 ar    NA     NA     NA     R78 

Upvotes: 2

Chuck P
Chuck P

Reputation: 3923

There is of course the old fashioned uncomplicated but wordy solution of just expressing all the "ors".

mydf %>% filter(param1 == "R78" | param2 == "R78" | param3 == "R78" | param4 == "R78" )

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388817

across works column-wise. In such cases I think it is better to use filter_at :

library(dplyr)
df %>% filter_at(vars(param1:param4), any_vars(. == 'R78'))

#   ID style param1 param2 param3 param4
#1   1    ar    R78   <NA>   <NA>   <NA>
#4   4    ar   <NA>    R78   <NA>   <NA>
#7   7    ar    R78   <NA>   <NA>   <NA>
#8   8    bg   <NA>   <NA>    R78   <NA>
#10 10    ar   <NA>    R78   <NA>   <NA>
#12 12    bh   <NA>   <NA>    R78   <NA>
#14 14    bg    R78   <NA>   <NA>   <NA>
#18 18    bh    R78   <NA>   <NA>   <NA>
#19 19    ar   <NA>   <NA>   <NA>    R78

A hack to make across work is to use Reduce :

df %>% filter(Reduce(`|`, across(param1:param4, ~. == 'R78')))

In base R, you can use rowSums :

cols <- paste0('param', 1:4)
df[rowSums(df[cols] == 'R78', na.rm = TRUE) > 0, ]

Upvotes: 4

Related Questions