Reputation: 300
I am trying to merge two DataFrame on Date and column called IDV. Here is my first DataFrame
df1
Date | IDV | Values |
---|---|---|
01/01/2020 | Var1 | 100 |
01/01/2020 | Var2 | 500 |
01/01/2020 | Var3 | 600 |
01/01/2020 | Var4 | 10 |
01/01/2020 | Var5 | 10 |
df2
Date | IDV | Values |
---|---|---|
01/01/2019 | Var2 | 110 |
01/01/2019 | Var3 | 510 |
01/01/2019 | Var1 | 300 |
01/01/2019 | Var5 | 20 |
01/01/2019 | Var4 | 20 |
my desired output would be
Date | IDV | Values | Last_Year_Values |
---|---|---|---|
01/01/2020 | Var1 | 100 | 300 |
01/01/2020 | Var2 | 500 | 110 |
01/01/2020 | Var3 | 600 | 510 |
01/01/2020 | Var4 | 10 | 20 |
01/01/2020 | Var5 | 10 | 20 |
I tried pd.merge(df1,df2,left_on ='date',right_on ='IDV', how = 'left')
Upvotes: 1
Views: 608
Reputation: 120391
Use DateOffset
if Date
is already datetime64
:
cols = ['Date', 'IDV', 'Values']
out = df1.merge(df2[cols].assign(Date=df2['Date']+pd.DateOffset(years=1)),
on=['Date', 'IDV'], how='left', suffixes=('', '_last_year'))
print(out)
# Output
Date IDV Values Values_last_year
0 2020-01-01 Var1 100 300
1 2020-01-01 Var2 500 110
2 2020-01-01 Var3 600 510
3 2020-01-01 Var4 10 20
4 2020-01-01 Var5 10 20
Setup:
import pandas as pd
d1 = {'Date': [pd.Timestamp('2020-01-01'),
pd.Timestamp('2020-01-01'),
pd.Timestamp('2020-01-01'),
pd.Timestamp('2020-01-01'),
pd.Timestamp('2020-01-01')],
'IDV': ['Var1', 'Var2', 'Var3', 'Var4', 'Var5'],
'Values': [100, 500, 600, 10, 10]}
df1 = pd.DataFrame(d1)
d2 = {'Date': [pd.Timestamp('2019-01-01'),
pd.Timestamp('2019-01-01'),
pd.Timestamp('2019-01-01'),
pd.Timestamp('2019-01-01'),
pd.Timestamp('2019-01-01')],
'IDV': ['Var2', 'Var3', 'Var1', 'Var5', 'Var4'],
'Values': [110, 510, 300, 20, 20]}
df2 = pd.DataFrame(d2)
Upvotes: 2
Reputation: 260300
Assuming a string type, a simple method would be to change the year in df2:
pd.merge(df1,
df2.assign(Date=df2['Date'].str.replace('2019', '2020')),
on=['Date', 'IDV'],
how='left', suffixes=('', '_last_year'))
or for a more generic method (works with any year):
pd.merge(df1,
df2.assign(Date=df2['Date'].str.replace(r'\d+$', lambda m: str(int(m.group(0))+1), regex=True)),
on=['Date', 'IDV'],
how='left', suffixes=('', '_last_year'))
output:
Date IDV Values Values_last_year
0 01/01/2020 Var1 100 300
1 01/01/2020 Var2 500 110
2 01/01/2020 Var3 600 510
3 01/01/2020 Var4 10 20
4 01/01/2020 Var5 10 20
Upvotes: 2