Reputation: 455
Assuming the following dataset has sorted list of dates:
dates=pd.DataFrame(data={'client':['1','2'],
'date':[['2012-3-10','2012-3-11','2012-3-12','2012-3-13','2012-3-14'],
['2012-3-12','2012-3-13','2012-3-16','2012-3-23']]})
I want to find the average date difference in terms of days
So, For eg, for Client
'2'
, Average Timelag
would be 2.75
Upvotes: 1
Views: 783
Reputation: 42875
Starting with:
client date
0 1 [2012-3-10, 2012-3-11, 2012-3-12, 2012-3-13, 2...
1 2 [2012-3-12, 2012-3-13, 2012-3-16, 2012-3-23]
You could
dates.groupby('client')['date'].apply(lambda x: [i / np.timedelta64(1, 'D') for i in np.diff([pd.to_datetime(c) for c in x])[0]])
to get the timedelta
in days
:
client
1 [1.0, 1.0, 1.0, 1.0]
2 [1.0, 3.0, 7.0]
or
dates.groupby('client')['date'].apply(lambda x: np.mean([i / np.timedelta64(1, 'D') for i in np.diff([pd.to_datetime(c) for c in x])[0]]))
for the mean
:
client
1 1.000000
2 3.666667
Upvotes: 2
Reputation: 85
This is a repeat of:
It looks like you can use the datetime modules to parse the date and/or time strings you're importing into objects which support algebra.
https://docs.python.org/2/library/datetime.html
Cheers
Upvotes: 0