Reputation: 4398
I have a dataset, df, where I would like to take a date diff in months from the present day (3/2/2021) and finally take its average
Data
A B C D
3/2/2015 3/2/2016 3/2/2010 3/2/2005
3/2/2000 3/2/2010 3/2/2005 4/2/2005
Desired
A B C D
72.00 60.00 132.00 192.00
252.00 132.00 192.00 193.00
162.00 96.00 162.00 192.50
Doing
import datetime
f_date = date(2015, 3, 2)
l_date = date(2021, 3, 2)
delta = l_date - f_date
print(delta.days)
However, how could I implement this to work on the full dataframe as a whole to give the desired output
Any suggestion is appreciated.
Upvotes: 1
Views: 116
Reputation: 9619
You can simply pass the datetimes to to_period('M')
and get the difference:
df = df.applymap(lambda x: (pd.to_datetime('today').to_period('M') - x.to_period('M')).n)
Output:
A | B | C | D | |
---|---|---|---|---|
0 | 72 | 60 | 132 | 192 |
1 | 252 | 132 | 192 | 191 |
Upvotes: 2
Reputation: 71689
You can use apply
+ pd.to_datetime
to parse the values in columns A
, B
, C
and D
as pandas datetime
objects, after the conversion subtract it from the today's date to get the dataframe containing the timedelta
objects, finally divide it by np.timedelta64(1, 'M')
to get the months
:
out = df.apply(pd.to_datetime, errors='coerce')
out = (pd.to_datetime('today').normalize() - out) / np.timedelta64(1, 'M')
>>> out
A B C D
0 72.050761 60.025873 132.043779 192.036797
1 252.029816 132.043779 192.036797 191.018296
In order to take average use mean
along axis=0
:
>>> out.mean()
A 162.040288
B 96.034826
C 162.040288
D 191.527547
dtype: float64
Upvotes: 3