Reputation: 11
I'm working with survey data and I need to filter by a bunch of multiple response variables, 543 variables to be precise.
Being my data like this:
Q1 <- c(1,0,1,1)
Q2 <- c(0,1,0,0)
Q3 <- c(1,1,1,0)
Q4 <- c(0,0,0,0)
Q5 <- c(1,0,0,0)
DT <- data.frame(Q1,Q2,Q3,Q4,Q5)
I want to know how many people response at least one of this questions, so the code using dplyr package should be:
MR <- DT %>%
filter(Q1 == 1 | Q2 == 1 | Q3 == 1 | Q4 == 1 | Q5 == 1 )
nrow(MR)
Basically, I'm trying to avoid write an extensive code from variable 1 until variable 543; like this:
library(dplyr)
MR <- DT %>%
filter(Q1 == 1 | Q2 == 1 | Q3 == 1 | Q4 == 1 | Q5 == 1 | ... | Q543 == 1)
Is there a more efficient way to filter by so many variables?
Upvotes: 1
Views: 232
Reputation: 5798
Base R one liner:
DT[c(sort(unique(unlist(lapply(DT, function(x){which(x==1)}))))),]
Upvotes: 1
Reputation: 887881
There are multiple ways to do this. One option is filter_at
where we specify the variables to be selected with one of the select_helpers
(matches
- the column names that start (^
) with "Q" followed by one or more digits (\\d+
) till the end ($
) of the string, and with any_vars
, create the logic. It keeps the rows that have at least one value in a column equal to 1
library(dplyr)
DT %>%
filter_at(vars(matches("^Q\\d+$")), any_vars(.==1))
Or using map
and reduce
. We loop through the select
ed columns with map
, create a logical vector
and reduce
it to a single logical vector
with |
. This can be used in filter
to filter the rows
library(purrr)
DT %>%
filter(map(select(., matches("^Q\\d+$")), `==`, 1) %>%
reduce(`|`))
Or another way is rowSums
DT %>%
filter(rowSums(select(., matches("^Q\\d+$")) ==1) > 0)
Upvotes: 4