Reputation: 7730
Here is my dataframe:
import pandas as pd
df_manual = pd.DataFrame({'A': ['one', 'one', 'two', 'two', 'one'] ,
'B': ['Ar', 'Br', 'Cr', 'Ar','Ar'] ,
'C': ['12/15/2011', '11/11/2001', '08/7/2015', '07/3/1999','03/03/2000' ]})
I would like to create column which would contain date difference for column see (with prior grouping). Here is what I wrote:
df_manual['C']=pd.to_datetime(df_manual['C'])
df_manual['diff'] =
df_manual.groupby(['A'])['C'].transform(lambda x: x.diff())
But the result I get is not day difference. Resulting difference between 2001-11-11 and 2000-03-03 is a date 1971-09-11, while I need number of days in between.
Any idea how to achieve it?
Upvotes: 2
Views: 2632
Reputation: 33773
Use apply
instead of transform
:
df_manual['diff'] = df_manual.groupby(['A'])['C'].apply(lambda x: x.diff())
The resulting output:
A B C diff
0 one Ar 2011-12-15 NaT
1 one Br 2001-11-11 -3686 days
2 two Cr 2015-08-07 NaT
3 two Ar 1999-07-03 -5879 days
4 one Ar 2000-03-03 -618 days
If you want df_manual['diff']
to be an integer instead of a timedelta, use the dt.days
accessor:
df_manual['diff'] = df_manual.groupby(['A'])['C'].apply(lambda x: x.diff()).dt.days
Upvotes: 4