Sean Kramer
Sean Kramer

Reputation: 123

Conditionally join a pandas DF on multiple indexes

I have two pandas dataframes of different lengths, and I need to conditionally overwrite the value from one into the other when two columns match.

df1.val = df2.val where df1.val == null and df1.key1 == df2.key1 and df1.key2 == df2.key2

df1 and df2 have very different sets of columns and row lengths, save for key1, key2, and val. The only thing that's guaranteed is that for every (key1, key2) in df1, there is exactly one (key1, key2) in df2.

So far I've been going down the path of

df1.loc[df1.val.isnull(), "val"] = df2.val

trying to set indexes to match, but I haven't got anywhere on that.

I know there's some join-ons, but I haven't made any progress on that front either. Some syntactical help would be appreciated.

Edit for data:

DF1:

First  Last   Val Random1 ...
John   Smith    4 x
Todd   Smith    5 Nan
John   Todd   Nan z

DF2:

First  Last   Val Random2 ...
John   Smith    7 4
Todd   Smith    6 9
John   Todd     3 3
Eric   Smith    5 2

Result:

First  Last   Val Random1 ...OtherDF1Cols...
John   Smith    4 x
Todd   Smith    5 Nan
John   Todd     3 z

Upvotes: 1

Views: 67

Answers (1)

piRSquared
piRSquared

Reputation: 294488

set your indices first, then fillna

DF1.set_index(['First', 'Last']).fillna(DF2.set_index(['First', 'Last']))

             Val
First Last      
John  Smith  4.0
Todd  Smith  5.0
John  Todd   3.0

Use combine_first to include all stuff from both dataframes

DF1.set_index(['First', 'Last']).combine_first(DF2.set_index(['First', 'Last']))

             Val
First Last      
Eric  Smith  5.0
John  Smith  4.0
      Todd   3.0
Todd  Smith  5.0

Or, update just Val column, and limit to just rows from first

d1 = DF1.set_index(['First', 'Last'])
d2 = DF2.set_index(['First', 'Last'])
print(d1.combine_first(d2[['Val']]).loc[d1.index].reset_index())

  First   Last  Val
0  John  Smith  4.0
1  Todd  Smith  5.0
2  John   Todd  3.0

An option using update

d1 = DF1.set_index(['First', 'Last'])
d2 = DF2.set_index(['First', 'Last'])
d1.update(d2.Val, overwrite=False)
d1.reset_index()

  First   Last  Val
0  John  Smith  4.0
1  Todd  Smith  5.0
2  John   Todd  3.0

Upvotes: 1

Related Questions