Reputation: 609

How to subset in dplyr with the same conditions among multiple variables

I have a list of variables in sequence and I was wondering if there was a simple way of subsetting in dplyr with same conditions among the sequence of variables.

For example, I have the following variables:

DX1 DX2 DX3 DX4 DX5

I would like a subset of my data if any of those variables contains the following string '7586'.

Subsetting from a single variable I would do the following:

filter(df, DX1 == '7586')

The only method I can think of would be the following:

filter(df, DX1 == '7586' | DX2 == '7586' | DX3 == '7586' | DX4 == '7586' | DX5 == '7586')

My actual dataset contains DX1-DX25 and it can get quite tedious to write.

Is there a method of simplifying the above?

something along the lines of

filter(df, DX1-25 == '7586')

Thank you

Upvotes: 1

Answers (3)

thelatemail

Reputation: 93938

Two options, one in base R, and one in dplyr, should produce the same result:

df[rowSums(df == 7586) > 0,]

library(dplyr)
df %>% 
   filter(rowSums(. == 7586) > 0)

and a data.table version for fun, which should eek out a tiny bit more speed if necessary:

library(data.table)
setDT(df)[, .SD[Reduce(`+`, lapply(.SD, `==`, 7586) ) > 0] ]

Upvotes: 5

myincas

Reputation: 1550

select(data, DX1:DX25) %>% apply(., 1, function(x) any(x == '7586')) %>% data[. , ]

Upvotes: 1

akuiper

Reputation: 215137

Maybe something like, if DX1 ~ DX25 are all or most of the columns of your data.frame?

df[apply(df, 1, function(row) any(row == '7586')), ]

     x    y
1 7586  322
2 3322 7586

Data:

structure(list(x = structure(c(2L, 1L), .Label = c("3322", "7586"
), class = "factor"), y = structure(1:2, .Label = c("322", "7586"
), class = "factor")), .Names = c("x", "y"), row.names = c(NA, 
-2L), class = "data.frame")

Upvotes: 1

How to subset in dplyr with the same conditions among multiple variables

Answers (3)

Related Questions