Reputation: 99
I am running an analysis on transaction data, and want to calculate how long it has been in days since the most recent transaction for each unique customer.
My initial data frame (df1) looks like:
memberId transactionDate
1111 2017-12-01
2222 2017-12-03
1111 2017-12-05
3333 2017-12-06
TransactionDate is in datetime format.
This is my code trying to create a new dataframe with memberId and number of days since most recent transaction:
import datetime as dt
import pandas as pd
NOW=dt.datetime(2017,12,06)
gb=df1.groupby('memberId')
df2=gb.agg({'transactionDate': lambda x: (NOW-x.max())})
This return as dataframe like:
memberId transactionDate
1111 1 days
2222 3 days
3333 0 days
But I want the days to be returns as an integer rather then with days. My second try to fix this was:
NOW=dt.datetime(2017,12,06)
gb=df1.groupby('memberId')
df2=gb.agg({'transactionDate': lambda x: (NOW-x.max()).days})
This returns something that looks like an epoch time for transactionDate, and output looks like:
memberId transactionDate
1111 1970-01-01 00:00:00.000000001
2222 1970-01-01 00:00:00.000000003
3333 1970-01-01 00:00:00.000000000
I think it is returning an epoch timestamp with number of days as microseconds.
How can I get an output like:
memberId transactionDate
1111 1
2222 3
3333 0
Upvotes: 1
Views: 4691
Reputation: 21
Try using the following code
df2.reset_index(inplace=True,drop=True)
df2['Date2']=df2['Date'].apply(lambda x: x.days)
Upvotes: 0
Reputation: 5136
Try forcing the Transaction Date before doing the calculation. I know you said it was in DateTime Format, but my solution is working on pandas version 0.20.2
:
import datetime as dt
import pandas as pd
df1['transactionDate'] = pd.to_datetime(df1['transactionDate']) # <--- changing here
#can also try df1['transactionDate'] = pd.to_datetime(df1['transactionDate'], format='%Y-%M-%d', errors='coerce')
# will try to force the data into this format
#df1['transactionDate'].dtype
#dtype('<M8[ns]')
NOW=dt.datetime(2017,12,6)
# use NOW = dt.datetime.now() if you want to do this programmatically
gb=df1.groupby('memberId')
df2=gb.agg({'transactionDate': lambda x: (NOW-x.max()).days})
output:
transactionDate
memberId
1111 1 #I get 1 for my answer since the max date of 1111 is 2017-12-05
2222 3
3333 0
Upvotes: 1