Filtering a dataframe based on a sequence of variables

Question

I have a dataframe that has a sequence of variables, c1...c20. Each of these variables contain a code. I have a vector of codes, code.vec, and I would like to subset the dataframe to contain records where c1|c2|c3|...|c20 are in code.vec.

Example data (only using 3 cn variables for the example):

code.vec<-c("T1", "T2", "T3", "T4")

c1<-c("T1", "X1", "T6", "R5")
c2<-c("R4", "C6", "C7", "X3")
c3<-c("C5", "C2", "X4", "T2")

df<-data.frame(c1, c2, c3)

This is what I am currently doing:

library(dplyr)
df %>% filter(c1 %in% code.vec | c2 %in% code.vec | c3 %in% code.vec)

  c1 c2 c3
1 T1 R4 C5
2 R5 X3 T2

This works, but since the real dataframe has 20 cn variables, it becomes a lot of typing. It seems like there should be a simple apply or loop solution to this (and is easy to do in SAS using an array and a do loop) but I cannot work out a solution in R, and I can't find any similar questions on here.

acylam · Accepted Answer

Here is a simple solution using filter_all from dplyr:

library(dplyr)

df %>% 
  filter_all(any_vars(. %in% code.vec))

Result:

  c1 c2 c3
1 T1 R4 C5
2 R5 X3 T2

Mentioned in the comments, if instead you want to filter on rows where all variables contain code.vec, you can use replace any_vars with all_vars:

df %>% 
  filter_all(all_vars(. %in% code.vec))

Filtering a dataframe based on a sequence of variables

Answers (2)

Related Questions