user3642360
user3642360

Reputation: 792

Extract first and last transaction date in R

I have a transaction data set as follows.

   id   purchase_date
    A   6/1/18
    A   6/2/18
    A   6/10/18
    A   6/15/18
    B   6/4/18
    B   6/5/18
    C   6/3/18
    C   6/7/18
    C   6/8/18
    C   6/9/18
    C   6/12/18
    D   6/21/18
    D   6/30/18
    D   7/1/18
    E   3/23/18

I want to take only first and last transaction for each id. My output should be as follows.

ID  FIRST_PURCHASE_DATE LAST_PURCHASE_DATE
A   6/1/18              6/15/18
B   6/4/18              6/5/18
C   6/3/18              6/12/18
D   6/21/18             7/1/18
E   3/23/18             3/23/18

My actual data has 200k unique ids. Any advice will be highly appreciated!

Upvotes: 0

Views: 549

Answers (1)

akrun
akrun

Reputation: 886938

We can use first and last from dplyr after grouping by 'ID'

library(dplyr)
df1 %>%
  group_by(ID) %>% 
  summarise(FIRST_PURCHASE_DATE = first(purchase_date),
            LAST_PURCHASE_DATE  = last(purchase_date))

The above assumes that 'purchase_date' is ordered by 'ID'. If it is not, arrange after converting to Date class and then take the first and last

library(lubridate)
df1 %>%
  arrange(ID, mdy(purchase_date)) %>%
  group_by(ID) %>% 
  summarise(FIRST_PURCHASE_DATE = first(purchase_date),
            LAST_PURCHASE_DATE  = last(purchase_date))

Upvotes: 3

Related Questions