Baraliuh
Baraliuh

Reputation: 2141

Dplyr filter with across returns unexpected results when using any

So as I am traversing from scoped filter to the new across syntax I stumbled upon a peculiarity that I do not understand. I was trying to recreate the syntax filter_at with any_vars using filter with across and any. To my surprise, the latter did not behave as I expected.

Here is some example data:

library(dplyr)
ex_data <- tibble::tibble(
  a1 = runif(5),
  a2 = 1:5,
)

Now let's say we want to find rows where all variables are less than 2, then here are two versions I tried:

#Using across gives the expected result
ex_data %>% 
  filter(across(contains('a'), ~.<2))
# A tibble: 1 x 2
     a1    a2
  <dbl> <int>
1 0.944     1
#Using filter_at with all_vars gives the same result
ex_data %>% 
  filter_at(vars(contains('a')), all_vars(.<2))
# A tibble: 1 x 2
     a1    a2
  <dbl> <int>
1 0.944     1

Everything works as expected. Now, let's say we want to find rows where any variable is greater than 3. This is how did it:

#Using across
ex_data %>% 
  filter(across(contains('a'), ~any(.>3)))
# A tibble: 0 x 2
# ... with 2 variables: a1 <dbl>, a2 <int>
#Using _at with any_vars
ex_data %>% 
  filter_at(vars(contains('a')), any_vars(.>3))
# A tibble: 2 x 2
      a1    a2
   <dbl> <int>
1 0.0346     4
2 0.741      5

This was quite a surprise for me to find that using any with across returned a 0-row tibble. Am I misunderstanding how across works inside of filter? My last attempt was to use across inside of any. This did behave as I expected, but of course does not return the correct output:

ex_data %>% 
  filter(any(across(contains('a'))>3))
# A tibble: 5 x 2
      a1    a2
   <dbl> <int>
1 0.944      1
2 0.0222     2
3 0.172      3
4 0.0346     4
5 0.741      5

Could someone clarify what is going on and how to make this work?

Upvotes: 1

Views: 131

Answers (1)

akrun
akrun

Reputation: 887511

We can use if_any i.e. if_any/if_all can be used in place of across + any_vars/all_vars (used dplyr version - 1.0.6)

library(dplyr)
ex_data %>% 
    filter(if_any(contains('a'), ~ . > 3))

-output

# A tibble: 2 x 2
     a1    a2
  <dbl> <int>
1 0.900     4
2 0.423     5

If we need the other rows, negate (!)

ex_data %>% 
     filter(!if_any(contains('a'), ~ . > 3))
# A tibble: 3 x 2
      a1    a2
   <dbl> <int>
1 0.536      1
2 0.0931     2
3 0.170      3

Upvotes: 1

Related Questions