Reputation: 411
I have two dataframes:
print (df1)
ID Birthday
0 A000 1990-01-01
1 A001 1991-05-05
2 A002 1970-10-01
3 A003 1980-07-07
4 A004 1945-08-15
print (df2)
ID Date from
0 A000 2010.01
1 A001 2012.01
2 A002 2010.01
3 A002 2010.01
4 A002 2010.11
5 A003 2009.05
6 A003 2010.01
7 A004 2010.01
8 A005 2007.11
9 A006 2017.01
df1 consists of IDs and and the birthday and df2 contains ID and the dates. Some of the values in df2.ID are not in df1.ID (i.e. A005 and A006).
What I am trying:
I'd like to calculate the difference between df1.Birthday and df2.Date if df2.ID exists in df1.ID.
What I have done so far:
df1['Birthday'] = pd.to_datetime(df1['Birthday'])
df2['Date from'] = pd.to_datetime(df2['Date from'])
x1 = df1.set_index(['ID'])['Birthday']
x2 = df2.set_index(['ID'])['Date from']
x3 = x2.sub(x1,fill_value=0)
print(x3)
ID
A000 -7305 days +00:00:00.000002
A001 -7794 days +00:00:00.000002
A002 -273 days +00:00:00.000002
A002 -273 days +00:00:00.000002
A002 -273 days +00:00:00.000002
A003 -3840 days +00:00:00.000002
A003 -3840 days +00:00:00.000002
A004 8905 days 00:00:00.000002
A005 0 days 00:00:00.000002
A006 0 days 00:00:00.000002
dtype: timedelta64[ns]
There is an error as ID A003 have a same value but it consists of different dates. I am not sure how I could go proceed to the next step. Thank you in advance for any assistance you can provide.
Upvotes: 1
Views: 255
Reputation: 2788
use dateutil package to get the diference in years, month, days:
from dateutil import relativedelta as rdelta
from datetime import date
d1 = date(2010,5,1)
d2 = date(2012,1,1)
rd = rdelta.relativedelta(d2,d1)
'
Upvotes: 0
Reputation: 68146
First, I would left merge the dataframes to make sure things were lining up properly. Then subtract the two date columns in a new column:
import pandas
from io import StringIO
data1 = StringIO("""\
ID Birthday
A000 1990-01-01
A001 1991-05-05
A002 1970-10-01
A003 1980-07-07
A004 1945-08-15
""")
data2 = StringIO("""\
ID Date_from
A000 2010.01
A001 2012.01
A002 2010.01
A002 2010.01
A002 2010.11
A003 2009.05
A003 2010.01
A004 2010.01
A005 2007.11
A006 2017.01
""")
x1 = pandas.read_table(data1, sep='\s+', parse_dates=['Birthday'])
x2 = pandas.read_table(data2, sep='\s+', parse_dates=['Date_from'])
data = (
x2.merge(right=x1, left_on='ID', right_on='ID', how='left')
.assign(Date_diff=lambda df: df['Date_from'] - df['Birthday'])
)
print(data)
And that gives me:
ID Date_from Birthday Date_diff
0 A000 2010-01-01 1990-01-01 7305 days
1 A001 2012-01-01 1991-05-05 7546 days
2 A002 2010-01-01 1970-10-01 14337 days
3 A002 2010-01-01 1970-10-01 14337 days
4 A002 2010-11-01 1970-10-01 14641 days
5 A003 2009-05-01 1980-07-07 10525 days
6 A003 2010-01-01 1980-07-07 10770 days
7 A004 2010-01-01 1945-08-15 23515 days
8 A005 2007-11-01 NaT NaT
9 A006 2017-01-01 NaT NaT
Upvotes: 1