subset data with conditions over different rows

Question

I'm trying to subset data according to conditions in different rows. In other words, I'd like to subset customers who visit store A and store B.

Here is an example:

dfrm <- data.frame(cust_id=sample(c(1:3),5,replace = TRUE), 
                   store=sample(LETTERS[1:2],5,replace = TRUE))

> dfrm
     cust_id store
>>       3     B
>>       3     A
>>       1     B
>>       1     B
>>       2     B

and I'd like to get a subset of cust_id = 3, because he visit A and B.

data

dfrm <-
structure(list(cust_id = c(3, 3, 1, 1, 2), store = structure(c(2L, 
1L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor")), .Names = c("cust_id", "store"),
row.names = c(NA, -5L), class = "data.frame")

Lamia · Accepted Answer

Using the dplyr package, you could do:

dfrm %>% group_by(cust_id) %>% filter(n_distinct(store,na.rm=T)==2)

Which returns the customers that visit two different stores:

cust_id  store
     
1       3      B
2       3      A

subset data with conditions over different rows

Answers (2)

Related Questions