Reputation: 792
I have a transaction data set as follows.
id purchase_date
A 6/1/18
A 6/2/18
A 6/10/18
A 6/15/18
B 6/4/18
B 6/5/18
C 6/3/18
C 6/7/18
C 6/8/18
C 6/9/18
C 6/12/18
D 6/21/18
D 6/30/18
D 7/1/18
E 3/23/18
I want to take only first and last transaction for each id. My output should be as follows.
ID FIRST_PURCHASE_DATE LAST_PURCHASE_DATE
A 6/1/18 6/15/18
B 6/4/18 6/5/18
C 6/3/18 6/12/18
D 6/21/18 7/1/18
E 3/23/18 3/23/18
My actual data has 200k unique ids. Any advice will be highly appreciated!
Upvotes: 0
Views: 549
Reputation: 886938
We can use first
and last
from dplyr
after grouping by 'ID'
library(dplyr)
df1 %>%
group_by(ID) %>%
summarise(FIRST_PURCHASE_DATE = first(purchase_date),
LAST_PURCHASE_DATE = last(purchase_date))
The above assumes that 'purchase_date' is ordered by 'ID'. If it is not, arrange
after converting to Date
class and then take the first
and last
library(lubridate)
df1 %>%
arrange(ID, mdy(purchase_date)) %>%
group_by(ID) %>%
summarise(FIRST_PURCHASE_DATE = first(purchase_date),
LAST_PURCHASE_DATE = last(purchase_date))
Upvotes: 3