Reputation: 11
I have two spreadsheets with different number of rows and columns.
What I would like to do is to compare both and extract the values of A and B from a1.xlsx
that match the column names of a2.xlsx
and copy the values to columns C and D in a2.xlsx. I know how to do this in excel with Index-Match but not using Python's Pandas.
spreadsheet 1 (a1.xlsx
)
Index A B C
0 s 0.2 new york
1 d 1 vienna
2 g 2 london
3 c 3 tokyo
4 r 2 paris
5 d 1 berlin
6 a 8 madrid
7 f 10 seattle
spreadsheet 2 (a2.xlsx
)
Index A B C D
0 dublin 34 x x
1 seoul 36 x x
2 london 12 x x
3 berlin 4 x x
4 tokyo 6 x x
5 seatte 22 x x
Upvotes: 1
Views: 2646
Reputation: 394051
Assuming spreadsheet1 is loaded to pandas df and spreadsheet 2 is loaded to df1 then you can assign the values from the result of a merge
:
In [20]:
df1[['C','D']] = df1.merge(df, left_on='A', right_on='C',how='left')[['A_y','B_y']]
df1
Out[20]:
A B C D
Index
0 dublin 34 NaN NaN
1 seoul 36 NaN NaN
2 london 12 g 2
3 berlin 4 d 1
4 tokyo 6 c 3
5 seattle 22 f 10
Just to explain a little when we merge we perform a left merge and this will produce a clash of columns as we have column names that clash:
In [24]:
df1.merge(df, left_on='A', right_on='C',how='left')
Out[24]:
A_x B_x C_x D A_y B_y C_y
0 dublin 34 x x NaN NaN NaN
1 seoul 36 x x NaN NaN NaN
2 london 12 x x g 2 london
3 berlin 4 x x d 1 berlin
4 tokyo 6 x x c 3 tokyo
5 seattle 22 x x f 10 seattle
We are interested in just columns A_y
and B_y
, we perform a left merge because if we did the default merge type which is 'inner' then this will not align with the original df, i.e. the values will be shifted to the top of the df.
Upvotes: 1