Reputation: 5716
I would like to select rows based using the same condition across different columns.
set.seed(123)
df <- data.frame(col.x = sample(LETTERS[1:10], 20, replace = TRUE),
col.y = sample(LETTERS[1:10], 20, replace = TRUE),
val = rnorm(20))
I need to have only the values in ValsToRetain
across two columns col.x
, col.y
.
ValsToRetain <- c('A','D', 'F','H','J')
I have tried these two approaches which give the same expected output.
df %>% filter(col.x %in% ValsToRetain) %>% filter(col.y %in% ValsToRetain)
df %>% filter(col.x %in% ValsToRetain & col.y %in% ValsToRetain)
# col.x col.y val
# 1 A H -1.6866933
# 2 F F 0.8377870
# 3 J J 0.4264642
# 4 F H 0.8781335
# 5 D D -0.3059627
But, is there any other elegant way(s) to do this?
For example; like calculating rowSums
across these columns to check if it has na
s or not. As there are more than one value, I could not get a similar idea like rowSums(df[,1:2] == 'A')
.
Upvotes: 2
Views: 108
Reputation: 38520
Here is a base R method using Reduce
and lapply
. The lapply
applies %in%
across the relevant columns, returning a list of logical vectors. Then Reduce
combines these vectors into a single vector using &
.
df[Reduce("&", lapply(df[c("col.x", "col.y")], "%in%", ValsToRetain)),]
col.x col.y val
6 A H -1.6866933
7 F F 0.8377870
11 J J 0.4264642
14 F H 0.8781335
19 D D -0.3059627
If you have many columns for the comparison, then you could use grep
to select them like grep("^col", names(df))
in place of c("col.x", "col.y")
.
Upvotes: 1
Reputation: 909
You can use subset, it's pretty short but the same way than your method:
filt <- subset(df,col.x %in% ValsToRetain & col.y %in% ValsToRetain)
The output is:
col.x col.y val
A H -1.6866933
F F 0.8377870
J J 0.4264642
F H 0.8781335
D D -0.3059627
Upvotes: -1
Reputation: 887891
We can use filter_at
with all_vars
df %>%
filter_at(vars(starts_with("col")), all_vars(. %in% ValsToRetain))
# col.x col.y val
#1 A H -1.6866933
#2 F F 0.8377870
#3 J J 0.4264642
#4 F H 0.8781335
#5 D D -0.3059627
Upvotes: 3