Reputation: 1716
I have 2 dfs which look like this
df1:
ID year notes score
12 2015 text 15.1
54 2014 text 18.4
df2:
id_num year score
12 2015 15.1
12 2014 12.9
54 2014 18.4
I'm trying to create a new df with all the data from df1 and only the score column from df where df1.year = df2.year+1. Like this:
ID year notes score prior_yr_score
12 2015 text 15.1 12.9
I'm reading through the pandas documentation, but am not finding a way to do this type of conditional joining. In sql I could just do
select a.*, b.score as prior_yr_score
from df1 as a left join df2 as b
on a.ID=b.id_num and a.year = b.year+1
Whereas in python I'm stuck at
merged=pd.merge(df1, df2, how='left',left_on='ID',right_on='id_num')
How can I do this in a single statement (pd.merge
or otherwise)?
edit: I've read through some other posts and documentation about sql style joins in python, but have not been able to find a clear answer. For example, this post looks similar but in the answers it seems like the OP is actually trying to calculate aggregate measures by group with conditions as opposed to joining 2 dfs with conditions.
Upvotes: 1
Views: 364
Reputation: 210832
In [92]: d1.merge(d2.assign(year=d2.year+1, prior_yr_score=d2.score).drop('score',1), left_on=['ID','year'], right_on=['id_num','year'])
Out[92]:
ID year notes score id_num prior_yr_score
0 12 2015 text 15.1 12 12.9
Upvotes: 1
Reputation: 31
Can you just add a column to your df2 that does the year+1 calculation, then merge on that new column instead?
Upvotes: 0