Reputation: 3123
I have a pandas datafame like this:
customer_id transaction_date ......................
0 12341 2011-05-06
1 12341 2011-06-22
2 12341 2011-05-09
3 12342 2011-06-16
4 12342 2011-07-26
5 12342 2011-03-06
6 12343 2011-05-16
7 12343 2011-04-25
8 12342 2011-07-12
9 12343 2011-05-09
10 12343 2011-04-12
11 12342 2011-07-03
I am trying to calculate the average interval a customer make transactions. In other words, every what number of days, does each customer make a transaction.
Here's what I have tried:
dummy_data.groupby(by=['customer_id'])['transaction_date'].transform('mean')
I get:
DataError: No numeric types to aggregate
How do I do this?
EDIT 1:
Expected output:
customer_id avg_days
12341 24
12342 ..
12343 ..
Explanation:
for customer with id "12341", date of first transaction is may-6, next may-9, next june 22. So, difference of days between first two transactions is 3 days, and between next transaction, 45 days. So, mean for 3 and 45 is (45+3/2 = 48/2 = 24). I would then need to do this for all customers in the dataframe.
Upvotes: 2
Views: 1333
Reputation: 13349
Considering your df:
def AVG_diff(x):
# Integer number of Days
return x.diff().mean()
df = df.sort_values(['customer_id', 'transaction_date'], ascending=[True, True]).reset_index(drop=True)
df['transaction_date'] = pd.to_datetime(df['transaction_date'])
df.groupby('customer_id')['transaction_date'].apply(lambda x: AVG_diff(x)).reset_index()
Result:
customer_id transaction_date
0 12341 23 days 12:00:00
1 12342 35 days 12:00:00
2 12343 11 days 08:00:00
Update:
x = df.groupby('customer_id')['transaction_date']
aggs = df.assign(avg_num=x.diff().dt.days) \
.groupby(['customer_id'], as_index=False)['avg_num'].mean().round()
aggs:
customer_id avg_num
0 12341 24.0
1 12342 36.0
2 12343 11.0
Upvotes: 3
Reputation: 34046
First convert transaction_date
column to pandas datetime
using pd.to_datetime
In [2947]: df.transaction_date = pd.to_datetime(df.transaction_date)
Now, add a new column in df
with days difference between two transcations using Groupby.shift
and find average using Groupby.mean
:
In [2976]: df['days'] = (df.sort_values('transaction_date').groupby('customer_id').transaction_date.shift() - df.transaction_date).dt.days.abs()
In [2985]: x = df.groupby('customer_id', as_index=False)['days'].mean().round()
In [2986]: x
Out[2986]:
customer_id days
0 12341 24.0
1 12342 36.0
2 12343 11.0
Upvotes: 4