Reputation: 123
I have two pandas dataframes of different lengths, and I need to conditionally overwrite the value from one into the other when two columns match.
df1.val = df2.val where df1.val == null and df1.key1 == df2.key1 and df1.key2 == df2.key2
df1
and df2
have very different sets of columns and row lengths, save for key1
, key2
, and val
. The only thing that's guaranteed is that for every (key1, key2)
in df1
, there is exactly one (key1, key2)
in df2
.
So far I've been going down the path of
df1.loc[df1.val.isnull(), "val"] = df2.val
trying to set indexes to match, but I haven't got anywhere on that.
I know there's some join-ons, but I haven't made any progress on that front either. Some syntactical help would be appreciated.
Edit for data:
DF1:
First Last Val Random1 ...
John Smith 4 x
Todd Smith 5 Nan
John Todd Nan z
DF2:
First Last Val Random2 ...
John Smith 7 4
Todd Smith 6 9
John Todd 3 3
Eric Smith 5 2
Result:
First Last Val Random1 ...OtherDF1Cols...
John Smith 4 x
Todd Smith 5 Nan
John Todd 3 z
Upvotes: 1
Views: 67
Reputation: 294488
set your indices first, then fillna
DF1.set_index(['First', 'Last']).fillna(DF2.set_index(['First', 'Last']))
Val
First Last
John Smith 4.0
Todd Smith 5.0
John Todd 3.0
Use combine_first
to include all stuff from both dataframes
DF1.set_index(['First', 'Last']).combine_first(DF2.set_index(['First', 'Last']))
Val
First Last
Eric Smith 5.0
John Smith 4.0
Todd 3.0
Todd Smith 5.0
Or, update just Val
column, and limit to just rows from first
d1 = DF1.set_index(['First', 'Last'])
d2 = DF2.set_index(['First', 'Last'])
print(d1.combine_first(d2[['Val']]).loc[d1.index].reset_index())
First Last Val
0 John Smith 4.0
1 Todd Smith 5.0
2 John Todd 3.0
An option using update
d1 = DF1.set_index(['First', 'Last'])
d2 = DF2.set_index(['First', 'Last'])
d1.update(d2.Val, overwrite=False)
d1.reset_index()
First Last Val
0 John Smith 4.0
1 Todd Smith 5.0
2 John Todd 3.0
Upvotes: 1