user1718097
user1718097

Reputation: 4292

Conditionally insert columns of one Pandas dataframe into columns of another dataframe

I have 2 dataframes:

dfA = pd.DataFrame({'label':[1,5,2,4,2,3],
                    'group':['A']*3 + ['B']*3,
                    'x':[np.nan]*3 + [1,2,3],
                    'y':[np.nan]*3 + [1,2,3]})

dfB = pd.DataFrame({'uniqid':[1,2,3,4,5,6,7],
                    'horizontal':[34,41,23,34,23,43,22],
                    'vertical':[98,67,19,57,68,88,77]})

...which look like:

   label group    x    y
0      1     A  NaN  NaN
1      5     A  NaN  NaN
2      2     A  NaN  NaN
3      4     B  1.0  1.0
4      2     B  2.0  2.0
5      3     B  3.0  3.0


   uniqid  horizontal  vertical
0       1          34        98
1       2          41        67
2       3          23        19
3       4          34        57
4       5          23        68
5       6          43        88
6       7          22        77

Basically, dfB contains 'horizontal' and 'vertical' values for all unique IDs. I want to populate the 'x' and 'y' columns in dfA with the 'horizontal' and 'vertical' values in dfB but only for group A; data for group B should remain unchanged.

The desired output would be:

   label group    x    y
0      1     A 34.0 98.0
1      5     A 23.0 68.0
2      2     A 41.0 67.0
3      4     B  1.0  1.0
4      2     B  2.0  2.0
5      3     B  3.0  3.0

I've used .merge() to add additional columns to the dataframe for both groups A and B and then copy data to x and y columns for group A only. And finally delete columns from dfB.

dfA = dfA.merge(dfB, how = 'left', left_on = 'label', right_on = 'uniqid')

dfA.loc[dfA['group'] == 'A','x'] = dfA.loc[dfA['group'] == 'A','horizontal'] 
dfA.loc[dfA['group'] == 'A','y'] = dfA.loc[dfA['group'] == 'A','vertical'] 

dfA = dfA[['label','group','x','y']]

The correct output is produced:

   label group     x     y
0      1     A  34.0  98.0
1      5     A  23.0  68.0
2      2     A  41.0  67.0
3      4     B   1.0   1.0
4      2     B   2.0   2.0
5      3     B   3.0   3.0

...but this is a really, really ugly solution. Is there a better solution?

Upvotes: 2

Views: 54

Answers (2)

piRSquared
piRSquared

Reputation: 294358

combine_first

dfA.set_index(['label', 'group']).combine_first(
    dfB.set_axis(['label', 'x', 'y'], axis=1).set_index(['label'])
).reset_index()

   label group     x     y
0      1     A  34.0  98.0
1      5     A  23.0  68.0
2      2     A  41.0  67.0
3      4     B   1.0   1.0
4      2     B   2.0   2.0
5      3     B   3.0   3.0

fillna

Works as well

dfA.set_index(['label', 'group']).fillna(
    dfB.set_axis(['label', 'x', 'y'], axis=1).set_index(['label'])
).reset_index()

Upvotes: 3

Quang Hoang
Quang Hoang

Reputation: 150765

We can try loc to extract/update only the part we want. And since you are merging on one column, which also has unique value on dfB, you can use set_index and loc/reindex:

mask = dfA['group']=='A'
dfA.loc[ mask, ['x','y']] = (dfB.set_index('uniqid')
                                .loc[dfA.loc[mask,'label'],
                                     ['horizontal','vertical']]
                                .values
                            )

Output:

   label group     x     y
0      1     A  34.0  98.0
1      5     A  23.0  68.0
2      2     A  41.0  67.0
3      4     B   1.0   1.0
4      2     B   2.0   2.0
5      3     B   3.0   3.0

Note that the above would fail if some of dfA.label is not in dfB.uniqueid. In which case, we need to use reindex:

(dfB.set_index('uniqid')
    .reindex[dfA.loc[mask,'label']
    [['horizontal','vertical']].values
)

Upvotes: 2

Related Questions