Prradep
Prradep

Reputation: 5716

filtering rows with same condition on more than one column

I would like to select rows based using the same condition across different columns.

set.seed(123)
df <- data.frame(col.x = sample(LETTERS[1:10], 20, replace = TRUE), 
                 col.y = sample(LETTERS[1:10], 20, replace = TRUE), 
                 val = rnorm(20))

I need to have only the values in ValsToRetain across two columns col.x, col.y.

ValsToRetain <- c('A','D', 'F','H','J')

I have tried these two approaches which give the same expected output.

df %>% filter(col.x %in% ValsToRetain) %>% filter(col.y %in% ValsToRetain)
df %>% filter(col.x %in% ValsToRetain & col.y %in% ValsToRetain)
#   col.x col.y        val
# 1     A     H -1.6866933
# 2     F     F  0.8377870
# 3     J     J  0.4264642
# 4     F     H  0.8781335
# 5     D     D -0.3059627

But, is there any other elegant way(s) to do this?

For example; like calculating rowSums across these columns to check if it has nas or not. As there are more than one value, I could not get a similar idea like rowSums(df[,1:2] == 'A').

Upvotes: 2

Views: 108

Answers (3)

lmo
lmo

Reputation: 38520

Here is a base R method using Reduce and lapply. The lapply applies %in% across the relevant columns, returning a list of logical vectors. Then Reduce combines these vectors into a single vector using &.

df[Reduce("&", lapply(df[c("col.x", "col.y")], "%in%", ValsToRetain)),]
   col.x col.y        val
6      A     H -1.6866933
7      F     F  0.8377870
11     J     J  0.4264642
14     F     H  0.8781335
19     D     D -0.3059627

If you have many columns for the comparison, then you could use grep to select them like grep("^col", names(df)) in place of c("col.x", "col.y").

Upvotes: 1

Orhan Yazar
Orhan Yazar

Reputation: 909

You can use subset, it's pretty short but the same way than your method:

filt <- subset(df,col.x %in% ValsToRetain & col.y %in% ValsToRetain)

The output is:

 col.x col.y        val
     A     H -1.6866933
     F     F  0.8377870
     J     J  0.4264642
     F     H  0.8781335
     D     D -0.3059627

Upvotes: -1

akrun
akrun

Reputation: 887891

We can use filter_at with all_vars

df %>% 
  filter_at(vars(starts_with("col")), all_vars(. %in% ValsToRetain))
#   col.x col.y        val
#1     A     H -1.6866933
#2     F     F  0.8377870
#3     J     J  0.4264642
#4     F     H  0.8781335
#5     D     D -0.3059627

Upvotes: 3

Related Questions