Luc
Luc

Reputation: 747

Calculate days since date column

I want to calculate the number of days each customer last transacted BEFORE he/she subscribes. If the day subscribe date for each customer is the same, then I can filter out the trx date after the subscribe date, but customers have different subscribe dates.

Initial dataframe

ad = {'customer':['Clark','Stones','Fay','Stones','Clark','Clark','Clark'],
    'subscribe_date':['2020-11-30','2020-07-01','2021-01-02','2020-07-12','2020-11-30','2020-11-30','2020-11-30'],
    'trx_date':['2020-12-30','2020-07-12','2020-07-14','2020-07-25','2021-02-01','2020-09-01','2020-11-27'],
    'trx_amount':[100,90,50,45,20,30,50],
    }
ad = pd.DataFrame(ad)
ad.sort_values(by=['customer','trx_date'])

Expected dataframe

ad2 = {'customer':['Clark','Stones','Fay'],
    'subscribe_date':['2020-11-30','2020-07-01','2021-01-02'],
      'days_since_last_succ_tx_BEFORE_subs':['3','0','0']}
ad2 = pd.DataFrame(ad2)
ad2

Explanation: Clark has transacted 4 times. He subscribed on November 30th, 2020. The last day he transacted before his subscription was November 27th, 2020. Thus, the value is 3.

If the customer has never transacted before subscription, I would leave the value np.NaN.

Upvotes: 1

Views: 138

Answers (2)

jezrael
jezrael

Reputation: 862591

Use:

#convert to datetimes
ad['trx_date'] = pd.to_datetime(ad['trx_date'])
ad['subscribe_date'] = pd.to_datetime(ad['subscribe_date'])

#get days difference
ad['days'] =  ad['subscribe_date'].sub(ad['trx_date']).dt.days

#replace nagative to NaN
ad['days'] = ad['days'].mask(ad['days'].lt(0))

#get rows by minimal days per customer
cols = ['customer','subscribe_date','days']
df = ad.sort_values(['customer','days']).drop_duplicates('customer')[cols]
print (df)
  customer subscribe_date   days
6    Clark     2020-11-30    3.0
2      Fay     2021-01-02  172.0
1   Stones     2020-07-01    NaN

Upvotes: 2

Durtal
Durtal

Reputation: 1028

# Convert columns to datetime
ad['subscribe_date'] = pd.to_datetime(ad['subscribe_date'])
ad['trx_date'] = pd.to_datetime(ad['trx_date'])

# Calculate timedelta
ad['time_delta'] = ad['subscribe_date'] - ad['trx_date']

# mark negative timedeltas as invalid
mask_after_subcribe = ad['subscribe_date'].lt(ad['trx_date'] )
ad.loc[mask_after_subcribe , 'time_delta'] = pd.NaT

# groubby customer and return minmal value of time_delta
time_delta_minimal = ad.groupby('customer')['time_delta'].agg(min)

Upvotes: 1

Related Questions