TomasDanke
TomasDanke

Reputation: 11

Filter base on multiple conditions and variables

I'm working with survey data and I need to filter by a bunch of multiple response variables, 543 variables to be precise.

Being my data like this:

Q1 <- c(1,0,1,1)
Q2 <- c(0,1,0,0)
Q3 <- c(1,1,1,0)
Q4 <- c(0,0,0,0)
Q5 <- c(1,0,0,0)
DT <- data.frame(Q1,Q2,Q3,Q4,Q5)

I want to know how many people response at least one of this questions, so the code using dplyr package should be:

MR <- DT %>%
   filter(Q1 == 1 | Q2 == 1 | Q3 == 1 | Q4 == 1 | Q5 == 1 )

nrow(MR)

Basically, I'm trying to avoid write an extensive code from variable 1 until variable 543; like this:

library(dplyr)
MR <- DT %>%
   filter(Q1 == 1 | Q2 == 1 | Q3 == 1 | Q4 == 1 | Q5 == 1 | ... | Q543 == 1)

Is there a more efficient way to filter by so many variables?

Upvotes: 1

Views: 232

Answers (2)

hello_friend
hello_friend

Reputation: 5798

Base R one liner:

DT[c(sort(unique(unlist(lapply(DT, function(x){which(x==1)}))))),]

Upvotes: 1

akrun
akrun

Reputation: 887881

There are multiple ways to do this. One option is filter_at where we specify the variables to be selected with one of the select_helpers (matches - the column names that start (^) with "Q" followed by one or more digits (\\d+) till the end ($) of the string, and with any_vars, create the logic. It keeps the rows that have at least one value in a column equal to 1

library(dplyr)
DT %>%
   filter_at(vars(matches("^Q\\d+$")), any_vars(.==1))

Or using map and reduce. We loop through the selected columns with map, create a logical vector and reduce it to a single logical vector with |. This can be used in filter to filter the rows

library(purrr)
DT %>%
   filter(map(select(., matches("^Q\\d+$")), `==`, 1) %>% 
             reduce(`|`))

Or another way is rowSums

DT %>%
   filter(rowSums(select(., matches("^Q\\d+$")) ==1) > 0)

Upvotes: 4

Related Questions