Reputation: 4292
I have 2 dataframes:
dfA = pd.DataFrame({'label':[1,5,2,4,2,3],
'group':['A']*3 + ['B']*3,
'x':[np.nan]*3 + [1,2,3],
'y':[np.nan]*3 + [1,2,3]})
dfB = pd.DataFrame({'uniqid':[1,2,3,4,5,6,7],
'horizontal':[34,41,23,34,23,43,22],
'vertical':[98,67,19,57,68,88,77]})
...which look like:
label group x y
0 1 A NaN NaN
1 5 A NaN NaN
2 2 A NaN NaN
3 4 B 1.0 1.0
4 2 B 2.0 2.0
5 3 B 3.0 3.0
uniqid horizontal vertical
0 1 34 98
1 2 41 67
2 3 23 19
3 4 34 57
4 5 23 68
5 6 43 88
6 7 22 77
Basically, dfB contains 'horizontal' and 'vertical' values for all unique IDs. I want to populate the 'x' and 'y' columns in dfA with the 'horizontal' and 'vertical' values in dfB but only for group A; data for group B should remain unchanged.
The desired output would be:
label group x y
0 1 A 34.0 98.0
1 5 A 23.0 68.0
2 2 A 41.0 67.0
3 4 B 1.0 1.0
4 2 B 2.0 2.0
5 3 B 3.0 3.0
I've used .merge() to add additional columns to the dataframe for both groups A and B and then copy data to x and y columns for group A only. And finally delete columns from dfB.
dfA = dfA.merge(dfB, how = 'left', left_on = 'label', right_on = 'uniqid')
dfA.loc[dfA['group'] == 'A','x'] = dfA.loc[dfA['group'] == 'A','horizontal']
dfA.loc[dfA['group'] == 'A','y'] = dfA.loc[dfA['group'] == 'A','vertical']
dfA = dfA[['label','group','x','y']]
The correct output is produced:
label group x y
0 1 A 34.0 98.0
1 5 A 23.0 68.0
2 2 A 41.0 67.0
3 4 B 1.0 1.0
4 2 B 2.0 2.0
5 3 B 3.0 3.0
...but this is a really, really ugly solution. Is there a better solution?
Upvotes: 2
Views: 54
Reputation: 294358
combine_first
dfA.set_index(['label', 'group']).combine_first(
dfB.set_axis(['label', 'x', 'y'], axis=1).set_index(['label'])
).reset_index()
label group x y
0 1 A 34.0 98.0
1 5 A 23.0 68.0
2 2 A 41.0 67.0
3 4 B 1.0 1.0
4 2 B 2.0 2.0
5 3 B 3.0 3.0
fillna
Works as well
dfA.set_index(['label', 'group']).fillna(
dfB.set_axis(['label', 'x', 'y'], axis=1).set_index(['label'])
).reset_index()
Upvotes: 3
Reputation: 150765
We can try loc
to extract/update only the part we want. And since you are merging on one column, which also has unique value on dfB
, you can use set_index
and loc/reindex
:
mask = dfA['group']=='A'
dfA.loc[ mask, ['x','y']] = (dfB.set_index('uniqid')
.loc[dfA.loc[mask,'label'],
['horizontal','vertical']]
.values
)
Output:
label group x y
0 1 A 34.0 98.0
1 5 A 23.0 68.0
2 2 A 41.0 67.0
3 4 B 1.0 1.0
4 2 B 2.0 2.0
5 3 B 3.0 3.0
Note that the above would fail if some of dfA.label
is not in dfB.uniqueid
. In which case, we need to use reindex
:
(dfB.set_index('uniqid')
.reindex[dfA.loc[mask,'label']
[['horizontal','vertical']].values
)
Upvotes: 2