Reputation: 18545
Given the following dataframe
import pandas as pd
df = pd.DataFrame({'visited': ['2015-3-4', '2015-3-5','2015-3-6','2016-3-4', '2016-3-6', '2016-3-8'],'name':['John','John','John','Mary','Mary','Mary']})
df['visited']=pd.to_datetime(df['visited'])
visited name
0 2015-03-01 John
1 2015-03-05 John
2 2015-03-06 John
3 2016-03-04 Mary
4 2016-03-06 Mary
5 2016-03-08 Mary
I wish to calculate the average visited interval by dayfor two people, in this example, the outcome should be
avg_visited_interval name
0 1 John
1 2 Mary
How should I do it?
Upvotes: 3
Views: 35
Reputation: 862671
Use GroupBy.agg
with lambda function for difference by Series.diff
, convert timedeltas to days by Series.dt.days
, last use Series.reset_index
:
df = (df.groupby('name')['visited']
.agg(lambda x: x.diff().dt.days.mean())
.reset_index(name='avg_visited_interval'))
print (df)
name avg_visited_interval
0 John 1.0
1 Mary 2.0
Last if necessary change ordering of columns:
df = df[['avg_visited_interval','name']]
print (df)
avg_visited_interval name
0 1.0 John
1 2.0 Mary
Upvotes: 2
Reputation: 28332
You could use groupby.apply
together with diff
and mean
like this:
df.groupby('name')['visited'].apply(lambda x: x.diff().mean()).to_frame('avg_visited_interval')
Result:
name avg_visited_interval
John 1 days
Mary 2 days
The avg_visited_interval
column here will be in datetime type. To make the column an int (i.e., number of days) you can add dt.days
between diff()
and mean()
.
Upvotes: 3