bbodek
bbodek

Reputation: 99

Pandas difference in days between today's date and another date_time column?

I am running an analysis on transaction data, and want to calculate how long it has been in days since the most recent transaction for each unique customer.

My initial data frame (df1) looks like:

memberId     transactionDate
1111         2017-12-01
2222         2017-12-03
1111         2017-12-05
3333         2017-12-06

TransactionDate is in datetime format.

This is my code trying to create a new dataframe with memberId and number of days since most recent transaction:

import datetime as dt
import pandas as pd

NOW=dt.datetime(2017,12,06)
gb=df1.groupby('memberId')
df2=gb.agg({'transactionDate': lambda x: (NOW-x.max())})           

This return as dataframe like:

memberId     transactionDate
1111         1 days
2222         3 days
3333         0 days

But I want the days to be returns as an integer rather then with days. My second try to fix this was:

NOW=dt.datetime(2017,12,06)
gb=df1.groupby('memberId')
df2=gb.agg({'transactionDate': lambda x: (NOW-x.max()).days})  

This returns something that looks like an epoch time for transactionDate, and output looks like:

memberId     transactionDate
1111         1970-01-01 00:00:00.000000001
2222         1970-01-01 00:00:00.000000003
3333         1970-01-01 00:00:00.000000000

I think it is returning an epoch timestamp with number of days as microseconds.

How can I get an output like:

memberId     transactionDate
1111         1
2222         3
3333         0

Upvotes: 1

Views: 4691

Answers (2)

csingh1992
csingh1992

Reputation: 21

Try using the following code

 df2.reset_index(inplace=True,drop=True)
 df2['Date2']=df2['Date'].apply(lambda x: x.days)

Upvotes: 0

MattR
MattR

Reputation: 5136

Try forcing the Transaction Date before doing the calculation. I know you said it was in DateTime Format, but my solution is working on pandas version 0.20.2:

import datetime as dt
import pandas as pd

df1['transactionDate'] = pd.to_datetime(df1['transactionDate']) # <--- changing here
#can also try df1['transactionDate'] = pd.to_datetime(df1['transactionDate'], format='%Y-%M-%d', errors='coerce')
# will try to force the data into this format

#df1['transactionDate'].dtype
#dtype('<M8[ns]')

NOW=dt.datetime(2017,12,6) 

# use NOW = dt.datetime.now() if you want to do this programmatically

gb=df1.groupby('memberId')
df2=gb.agg({'transactionDate': lambda x: (NOW-x.max()).days})  

output:

          transactionDate
memberId                 
1111                    1 #I get 1 for my answer since the max date of 1111 is 2017-12-05
2222                    3
3333                    0

Upvotes: 1

Related Questions