severin
severin

Reputation: 2126

Link filter(across()) statements with | (OR) instead of & (AND)

How do I filter a dataframe df for all rows where one or more of columns_to_check meet a condition. As an example: Where is at least one cell NA?


df <- tibble(a = c('x', 'x', 'x'),
             b = c(NA,  'x', 'x'),
             c = c(NA,   NA, 'x'))
columns_to_check <- c('b', 'c')

Checking where all columns are NA is straightforward:

library(tidyverse)
df %>%
  filter(across(all_of(columns_to_check), ~ !is.na(.x)))

#> # A tibble: 1 x 3
#>   a     b     c    
#>   <chr> <chr> <chr>
#> 1 x     x     x

But (how) can I combine the filter() statements created with across() using OR?

Upvotes: 3

Views: 414

Answers (3)

tmfmnk
tmfmnk

Reputation: 40131

Another solution could be:

df %>%
 filter(across(all_of(columns_to_check), ~ !is.na(.x)) == TRUE)

  a     b     c    
  <chr> <chr> <chr>
1 x     x     <NA> 
2 x     x     x    

Upvotes: 0

severin
severin

Reputation: 2126

My mistake, this is documented in vignette("rowwise"):

df %>%
  filter(rowSums(across(all_of(columns_to_check), ~ !is.na(.x))) > 0)

Upvotes: 2

Ian Campbell
Ian Campbell

Reputation: 24838

Here's an approach with reduce from purrr:

df %>%
  filter(reduce(.x = across(all_of(columns_to_check), ~ !is.na(.x)), .f = `|`))

This works because across returns a list of logical vectors that are length nrow(df).

You can see that behavior when you execute it in mutate:

df %>%
+   mutate(across(all_of(columns_to_check), ~ !is.na(.x)))

# A tibble: 3 x 3
  a     b     c    
  <chr> <lgl> <lgl>
1 x     FALSE FALSE
2 x     TRUE  FALSE
3 x     TRUE  TRUE 

Therefore, you can reduce them together with | to get one logical vector. You don't need .x or .f, they are only there for illustrative purposes.

Upvotes: 2

Related Questions