Reputation: 580
I'm trying to subset data according to conditions in different rows.
In other words, I'd like to subset customers
who visit store A
and store B
.
Here is an example:
dfrm <- data.frame(cust_id=sample(c(1:3),5,replace = TRUE),
store=sample(LETTERS[1:2],5,replace = TRUE))
> dfrm
cust_id store
>> 3 B
>> 3 A
>> 1 B
>> 1 B
>> 2 B
and I'd like to get a subset of cust_id
= 3, because he visit A
and B
.
data
dfrm <-
structure(list(cust_id = c(3, 3, 1, 1, 2), store = structure(c(2L,
1L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor")), .Names = c("cust_id", "store"),
row.names = c(NA, -5L), class = "data.frame")
Upvotes: 1
Views: 82
Reputation: 42592
For the sake of completeness, here are also two data.table
solutions:
The first one returns all rows of dfrm
of all customers who visited exactly two different stores:
library(data.table)
setDT(dfrm)[, .SD[uniqueN(store) == 2L], by = cust_id]
# cust_id store
#1: 3 B
#2: 3 A
This is what also the dplyr
solution does but about 25% faster for a data set of 80 k rows.
The second one returns all rows of dfrm
of all customers who visited the maximum number of different stores (which is also 2 in the given toy data sample):
setDT(dfrm)[, .SD[uniqueN(store) == uniqueN(dfrm[["store"]])], by = cust_id]
# cust_id store
#1: 3 B
#2: 3 A
This is what also the base R solution does but about 15 times faster for a data set of 80 k rows.
Upvotes: 2
Reputation: 3875
Using the dplyr
package,
you could do:
dfrm %>% group_by(cust_id) %>% filter(n_distinct(store,na.rm=T)==2)
Which returns the customers that visit two different stores:
cust_id store
<dbl> <fctr>
1 3 B
2 3 A
Upvotes: 1