Reputation: 1716

SQL style conditional joins in pandas

I have 2 dfs which look like this

df1:

ID  year  notes  score
12  2015  text   15.1
54  2014  text   18.4

df2:

id_num  year  score
12      2015  15.1
12      2014  12.9
54      2014  18.4

I'm trying to create a new df with all the data from df1 and only the score column from df where df1.year = df2.year+1. Like this:

ID  year   notes  score  prior_yr_score
12  2015   text   15.1   12.9

I'm reading through the pandas documentation, but am not finding a way to do this type of conditional joining. In sql I could just do

select a.*, b.score as prior_yr_score
from df1 as a left join df2 as b
on a.ID=b.id_num and a.year = b.year+1

Whereas in python I'm stuck at

merged=pd.merge(df1, df2, how='left',left_on='ID',right_on='id_num')

How can I do this in a single statement (pd.merge or otherwise)?

edit: I've read through some other posts and documentation about sql style joins in python, but have not been able to find a clear answer. For example, this post looks similar but in the answers it seems like the OP is actually trying to calculate aggregate measures by group with conditions as opposed to joining 2 dfs with conditions.

Upvotes: 1

Answers (2)

MaxU - stand with Ukraine

Reputation: 210832

In [92]: d1.merge(d2.assign(year=d2.year+1, prior_yr_score=d2.score).drop('score',1), left_on=['ID','year'], right_on=['id_num','year'])
Out[92]:
   ID  year notes  score  id_num  prior_yr_score
0  12  2015  text   15.1      12            12.9

Upvotes: 1

A.K. Ferrara

Reputation: 31

Can you just add a column to your df2 that does the year+1 calculation, then merge on that new column instead?

Upvotes: 0

SQL style conditional joins in pandas

Answers (2)

Related Questions