Chris Ruehlemann
Chris Ruehlemann

Reputation: 21440

How to select columns depending on multiple conditions in dplyr

I'm looking for a solution in dplyr for the task of selecting columns of a dataframe based on multiple conditions. Say, we have this type of df:

X <- c("B", "C", "D", "E")
a1 <- c(1, 0, 3, 0)
a2 <- c(235, 270, 100, 1)
a3 <- c(3, 1000, 900, 2)
df1 <- data.frame(X, a1, a2, a3)

Let's further assume I want to select that column/those columns that are

That is, in this case, what we want to select is column a1. How can this be done in dplyr? My understanding is that in order to select a column in dplyr you use select and, if that selection is governed by conditions, also where. But how to combine two such select(where...) statements? This, for example, is not the right way to do it as it throws an error:

df1 %>%
  select(where(is.numeric) & where(~ all(.) < 5))
Error: `where()` must be used with functions that return `TRUE` or `FALSE`.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
In all(.) : coercing argument of type 'character' to logical

Upvotes: 0

Views: 1172

Answers (2)

PaulS
PaulS

Reputation: 25528

Another possible solution, based on dplyr::mutate:

library(dplyr)

df1 %>% 
  mutate(across(everything(), ~ if (all(.x < 5) & is.numeric(.x)) .x))

#>   a1
#> 1  1
#> 2  0
#> 3  3
#> 4  0

Or even more shortly:

df1 %>% 
  mutate(across(everything(), ~ if (all(.x < 5)) .x))

Upvotes: 2

benson23
benson23

Reputation: 19142

Inside where, we need to supply functions that have logical results.

library(dplyr)

select(df1, \(x) all(x < 5))

# or this, which might be more semantically correct
select(df1, where(\(x) is.numeric(x) & all(x < 5)))

  a1
1  1
2  0
3  3
4  0

Data

df1 <- structure(list(X = c("B", "C", "D", "E"), a1 = c(1, 0, 3, 0), 
    a2 = c(235, 270, 100, 1), a3 = c(3, 1000, 900, 2)), class = "data.frame", row.names = c(NA, 
-4L))

Upvotes: 3

Related Questions