Reputation: 609
I have a list of variables in sequence and I was wondering if there was a simple way of subsetting in dplyr with same conditions among the sequence of variables.
For example, I have the following variables:
DX1 DX2 DX3 DX4 DX5
I would like a subset of my data if any of those variables contains the following string '7586'.
Subsetting from a single variable I would do the following:
filter(df, DX1 == '7586')
The only method I can think of would be the following:
filter(df, DX1 == '7586' | DX2 == '7586' | DX3 == '7586' | DX4 == '7586' | DX5 == '7586')
My actual dataset contains DX1-DX25 and it can get quite tedious to write.
Is there a method of simplifying the above?
something along the lines of
filter(df, DX1-25 == '7586')
Thank you
Upvotes: 1
Views: 443
Reputation: 93938
Two options, one in base R, and one in dplyr
, should produce the same result:
df[rowSums(df == 7586) > 0,]
or
library(dplyr)
df %>%
filter(rowSums(. == 7586) > 0)
and a data.table
version for fun, which should eek out a tiny bit more speed if necessary:
library(data.table)
setDT(df)[, .SD[Reduce(`+`, lapply(.SD, `==`, 7586) ) > 0] ]
Upvotes: 5
Reputation: 1550
select(data, DX1:DX25) %>% apply(., 1, function(x) any(x == '7586')) %>% data[. , ]
Upvotes: 1
Reputation: 215137
Maybe something like, if DX1 ~ DX25 are all or most of the columns of your data.frame?
df[apply(df, 1, function(row) any(row == '7586')), ]
x y
1 7586 322
2 3322 7586
Data:
structure(list(x = structure(c(2L, 1L), .Label = c("3322", "7586"
), class = "factor"), y = structure(1:2, .Label = c("322", "7586"
), class = "factor")), .Names = c("x", "y"), row.names = c(NA,
-2L), class = "data.frame")
Upvotes: 1