Reputation: 317
I have a dataframe with 3 columns: ID_CUST, ID_TRANSACTION, TRANSACTION_MONTH.
Let's imagine the following dataframe:
ID_CUSTOMER ID_TRANSACTION TRANSACTION_MONTH
2 43 2
2 94 5
2 99 8
3 102 4
5 121 12
I first select only transactions for a period:
df_subset <- DM_TRAFFIC %>% filter(MONTH >=2 & MONTH <=5)
ID_CUSTOMER ID_TRANSACTION TRANSACTION_MONTH
2 43 2
2 94 5
3 102 4
From those transactions, I then wish to keep all transactions from customers who made a transaction in Month 5.
ID_CUSTOMER ID_TRANSACTION TRANSACTION_MONTH
2 43 2
2 94 5
I'm a bit confused on how to tackle this step .
If you have the solution in dplyr or datatable, it would be even more appreciated!
Thank you!
Upvotes: 1
Views: 849
Reputation: 28441
After your first subset, use group_by
to filter by each ID_CUSTOMER
. Then use filter
and any
to look for a group that has at least one value satisfying the condition.
dplyr
DM_TRAFFIC %>%
filter(TRANSACTION_MONTH >=2 & TRANSACTION_MONTH <=5) %>%
group_by(ID_CUSTOMER) %>%
filter(any(TRANSACTION_MONTH == 5))
#Source: local data frame [2 x 3]
#Groups: ID_CUSTOMER [1]
#
# ID_CUSTOMER ID_TRANSACTION TRANSACTION_MONTH
# (int) (int) (int)
#1 2 43 2
#2 2 94 5
data.table
setDT(DM_TRAFFIC)[TRANSACTION_MONTH >= 2 & TRANSACTION_MONTH <= 5][
, if(any(TRANSACTION_MONTH == 5)) .SD,by = ID_CUSTOMER]
# ID_CUSTOMER ID_TRANSACTION TRANSACTION_MONTH
#1: 2 43 2
#2: 2 94 5
Upvotes: 3