Reputation: 77
I have a data frame with ten columns, but five columns of concern: A, B, C, D, E. I also have a list of values. What's the best way to subset the rows whose values in column A, B, C, D, OR, E is included in the list of values?
If I were only concerned with a single column, I know I can use left_join(list_of_values, df$A)
but I'm not sure how to do something similar with multiple columns.
Upvotes: 0
Views: 77
Reputation: 8110
The key here is if_any
.
library(tidyverse)
set.seed(26)
sample_df <- tibble(col = rep(LETTERS[1:8], each = 5),
val = sample(1:10, 40, replace = TRUE),
ID = rep(1:5, 8)) |>
pivot_wider(names_from = col, values_from = val)
sample_df
#> # A tibble: 5 x 9
#> ID A B C D E F G H
#> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 1 8 4 10 7 2 7 4 3
#> 2 2 3 2 3 3 4 10 2 3
#> 3 3 9 6 6 8 2 10 10 3
#> 4 4 7 6 8 9 3 5 8 3
#> 5 5 6 3 4 1 9 7 9 1
vals <- c(1, 7)
#solution
sample_df |>
filter(if_any(A:E, ~. %in% vals))
#> # A tibble: 3 x 9
#> ID A B C D E F G H
#> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 1 8 4 10 7 2 7 4 3
#> 2 4 7 6 8 9 3 5 8 3
#> 3 5 6 3 4 1 9 7 9 1
or any
and apply
with base R:
#base solution
indx <- apply(sample_df[,which(colnames(sample_df) %in% LETTERS[1:5])], 1, \(x) any(x %in% vals))
sample_df[indx,]
#> # A tibble: 3 x 9
#> ID A B C D E F G H
#> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 1 8 4 10 7 2 7 4 3
#> 2 4 7 6 8 9 3 5 8 3
#> 3 5 6 3 4 1 9 7 9 1
Upvotes: 3