Naveen Reddy Marthala
Naveen Reddy Marthala

Reputation: 3123

pandas groupby a column, then get average number of days from a date column for each group

I have a pandas datafame like this:

    customer_id    transaction_date   ......................
 0    12341          2011-05-06
 1    12341          2011-06-22
 2    12341          2011-05-09
 3    12342          2011-06-16
 4    12342          2011-07-26
 5    12342          2011-03-06
 6    12343          2011-05-16
 7    12343          2011-04-25
 8    12342          2011-07-12
 9    12343          2011-05-09
 10   12343          2011-04-12
 11   12342          2011-07-03

I am trying to calculate the average interval a customer make transactions. In other words, every what number of days, does each customer make a transaction.

Here's what I have tried:

dummy_data.groupby(by=['customer_id'])['transaction_date'].transform('mean')

I get:

DataError: No numeric types to aggregate

How do I do this?

EDIT 1:

Expected output:

customer_id   avg_days
  12341         24
  12342         ..
  12343         ..

Explanation:
for customer with id "12341", date of first transaction is may-6, next may-9, next june 22. So, difference of days between first two transactions is 3 days, and between next transaction, 45 days. So, mean for 3 and 45 is (45+3/2 = 48/2 = 24). I would then need to do this for all customers in the dataframe.

Upvotes: 2

Views: 1333

Answers (2)

Pygirl
Pygirl

Reputation: 13349

Considering your df:

def AVG_diff(x):
    # Integer number of Days
    return x.diff().mean()

df = df.sort_values(['customer_id', 'transaction_date'], ascending=[True, True]).reset_index(drop=True)
df['transaction_date'] = pd.to_datetime(df['transaction_date'])
df.groupby('customer_id')['transaction_date'].apply(lambda x: AVG_diff(x)).reset_index()

Result:

   customer_id transaction_date
0        12341 23 days 12:00:00
1        12342 35 days 12:00:00
2        12343 11 days 08:00:00

Update:

x = df.groupby('customer_id')['transaction_date']
aggs = df.assign(avg_num=x.diff().dt.days) \
               .groupby(['customer_id'], as_index=False)['avg_num'].mean().round()

aggs:

   customer_id  avg_num
0        12341     24.0
1        12342     36.0
2        12343     11.0

Upvotes: 3

Mayank Porwal
Mayank Porwal

Reputation: 34046

First convert transaction_date column to pandas datetime using pd.to_datetime

In [2947]: df.transaction_date = pd.to_datetime(df.transaction_date)

Now, add a new column in df with days difference between two transcations using Groupby.shift and find average using Groupby.mean:

In [2976]: df['days'] = (df.sort_values('transaction_date').groupby('customer_id').transaction_date.shift() - df.transaction_date).dt.days.abs()

In [2985]: x = df.groupby('customer_id', as_index=False)['days'].mean().round()

In [2986]: x
Out[2986]: 
   customer_id  days
0        12341  24.0
1        12342  36.0
2        12343  11.0

Upvotes: 4

Related Questions