william007
william007

Reputation: 18545

Average visited interval for different people

Given the following dataframe

import pandas as pd
df = pd.DataFrame({'visited': ['2015-3-4', '2015-3-5','2015-3-6','2016-3-4', '2016-3-6', '2016-3-8'],'name':['John','John','John','Mary','Mary','Mary']})
df['visited']=pd.to_datetime(df['visited'])
       visited  name
0   2015-03-01  John
1   2015-03-05  John
2   2015-03-06  John
3   2016-03-04  Mary
4   2016-03-06  Mary
5   2016-03-08  Mary

I wish to calculate the average visited interval by dayfor two people, in this example, the outcome should be

  avg_visited_interval  name
0                    1  John
1                    2  Mary

How should I do it?

Upvotes: 3

Views: 35

Answers (2)

jezrael
jezrael

Reputation: 862671

Use GroupBy.agg with lambda function for difference by Series.diff, convert timedeltas to days by Series.dt.days, last use Series.reset_index:

df = (df.groupby('name')['visited']
        .agg(lambda x: x.diff().dt.days.mean())
        .reset_index(name='avg_visited_interval'))
print (df)
   name  avg_visited_interval
0  John                   1.0
1  Mary                   2.0

Last if necessary change ordering of columns:

df = df[['avg_visited_interval','name']]
print (df)
   avg_visited_interval  name
0                   1.0  John
1                   2.0  Mary

Upvotes: 2

Shaido
Shaido

Reputation: 28332

You could use groupby.apply together with diff and mean like this:

df.groupby('name')['visited'].apply(lambda x: x.diff().mean()).to_frame('avg_visited_interval')

Result:

name    avg_visited_interval
John                  1 days
Mary                  2 days

The avg_visited_interval column here will be in datetime type. To make the column an int (i.e., number of days) you can add dt.days between diff() and mean().

Upvotes: 3

Related Questions