Reputation: 2126
How do I filter a dataframe df
for all rows where one or more of columns_to_check
meet a condition. As an example: Where is at least one cell NA?
df <- tibble(a = c('x', 'x', 'x'),
b = c(NA, 'x', 'x'),
c = c(NA, NA, 'x'))
columns_to_check <- c('b', 'c')
Checking where all columns are NA
is straightforward:
library(tidyverse)
df %>%
filter(across(all_of(columns_to_check), ~ !is.na(.x)))
#> # A tibble: 1 x 3
#> a b c
#> <chr> <chr> <chr>
#> 1 x x x
But (how) can I combine the filter()
statements created with across()
using OR?
Upvotes: 3
Views: 414
Reputation: 40131
Another solution could be:
df %>%
filter(across(all_of(columns_to_check), ~ !is.na(.x)) == TRUE)
a b c
<chr> <chr> <chr>
1 x x <NA>
2 x x x
Upvotes: 0
Reputation: 2126
My mistake, this is documented in vignette("rowwise")
:
df %>%
filter(rowSums(across(all_of(columns_to_check), ~ !is.na(.x))) > 0)
Upvotes: 2
Reputation: 24838
Here's an approach with reduce
from purrr
:
df %>%
filter(reduce(.x = across(all_of(columns_to_check), ~ !is.na(.x)), .f = `|`))
This works because across
returns a list of logical vectors that are length nrow(df)
.
You can see that behavior when you execute it in mutate
:
df %>%
+ mutate(across(all_of(columns_to_check), ~ !is.na(.x)))
# A tibble: 3 x 3
a b c
<chr> <lgl> <lgl>
1 x FALSE FALSE
2 x TRUE FALSE
3 x TRUE TRUE
Therefore, you can reduce them together with |
to get one logical vector. You don't need .x
or .f
, they are only there for illustrative purposes.
Upvotes: 2