Subtract dates if indexes match each other in Python Pandas

Question

I have two dataframes:

print (df1)
    ID      Birthday
0   A000    1990-01-01
1   A001    1991-05-05
2   A002    1970-10-01
3   A003    1980-07-07
4   A004    1945-08-15

print (df2)
    ID      Date from
0   A000    2010.01
1   A001    2012.01
2   A002    2010.01
3   A002    2010.01
4   A002    2010.11
5   A003    2009.05
6   A003    2010.01
7   A004    2010.01
8   A005    2007.11
9   A006    2017.01

df1 consists of IDs and and the birthday and df2 contains ID and the dates. Some of the values in df2.ID are not in df1.ID (i.e. A005 and A006).

What I am trying:

I'd like to calculate the difference between df1.Birthday and df2.Date if df2.ID exists in df1.ID.

What I have done so far:

df1['Birthday'] = pd.to_datetime(df1['Birthday'])
df2['Date from'] = pd.to_datetime(df2['Date from'])

x1 = df1.set_index(['ID'])['Birthday']
x2 = df2.set_index(['ID'])['Date from']
x3 = x2.sub(x1,fill_value=0)

print(x3)
ID
A000   -7305 days +00:00:00.000002
A001   -7794 days +00:00:00.000002
A002    -273 days +00:00:00.000002
A002    -273 days +00:00:00.000002
A002    -273 days +00:00:00.000002
A003   -3840 days +00:00:00.000002
A003   -3840 days +00:00:00.000002
A004     8905 days 00:00:00.000002
A005        0 days 00:00:00.000002
A006        0 days 00:00:00.000002
dtype: timedelta64[ns]

There is an error as ID A003 have a same value but it consists of different dates. I am not sure how I could go proceed to the next step. Thank you in advance for any assistance you can provide.

Paul H · Accepted Answer

First, I would left merge the dataframes to make sure things were lining up properly. Then subtract the two date columns in a new column:

import pandas
from io import StringIO

data1 = StringIO("""\
ID      Birthday
A000    1990-01-01
A001    1991-05-05
A002    1970-10-01
A003    1980-07-07
A004    1945-08-15
""")

data2 = StringIO("""\
ID      Date_from
A000    2010.01
A001    2012.01
A002    2010.01
A002    2010.01
A002    2010.11
A003    2009.05
A003    2010.01
A004    2010.01
A005    2007.11
A006    2017.01
""")

x1 = pandas.read_table(data1, sep='\s+', parse_dates=['Birthday'])
x2 = pandas.read_table(data2, sep='\s+', parse_dates=['Date_from'])


data = (
    x2.merge(right=x1, left_on='ID', right_on='ID', how='left')
      .assign(Date_diff=lambda df: df['Date_from'] - df['Birthday'])
)

print(data)

And that gives me:

     ID  Date_from   Birthday  Date_diff
0  A000 2010-01-01 1990-01-01  7305 days
1  A001 2012-01-01 1991-05-05  7546 days
2  A002 2010-01-01 1970-10-01 14337 days
3  A002 2010-01-01 1970-10-01 14337 days
4  A002 2010-11-01 1970-10-01 14641 days
5  A003 2009-05-01 1980-07-07 10525 days
6  A003 2010-01-01 1980-07-07 10770 days
7  A004 2010-01-01 1945-08-15 23515 days
8  A005 2007-11-01        NaT        NaT
9  A006 2017-01-01        NaT        NaT

Subtract dates if indexes match each other in Python Pandas

Answers (2)

Related Questions