maxiloy
maxiloy

Reputation: 11

Compare two spreadsheets and extract the values

I have two spreadsheets with different number of rows and columns.

What I would like to do is to compare both and extract the values of A and B from a1.xlsx that match the column names of a2.xlsx and copy the values to columns C and D in a2.xlsx. I know how to do this in excel with Index-Match but not using Python's Pandas.

spreadsheet 1 (a1.xlsx)

Index   A   B   C
0   s   0.2 new york
1   d   1   vienna
2   g   2   london
3   c   3   tokyo
4   r   2   paris
5   d   1   berlin
6   a   8   madrid
7   f   10  seattle

spreadsheet 2 (a2.xlsx)

Index   A   B   C   D
0   dublin  34  x   x
1   seoul   36  x   x
2   london  12  x   x
3   berlin  4   x   x
4   tokyo   6   x   x
5   seatte  22  x   x

Upvotes: 1

Views: 2646

Answers (1)

EdChum
EdChum

Reputation: 394051

Assuming spreadsheet1 is loaded to pandas df and spreadsheet 2 is loaded to df1 then you can assign the values from the result of a merge:

In [20]:

df1[['C','D']] = df1.merge(df, left_on='A', right_on='C',how='left')[['A_y','B_y']]
df1
Out[20]:
             A   B    C   D
Index                      
0       dublin  34  NaN NaN
1        seoul  36  NaN NaN
2       london  12    g   2
3       berlin   4    d   1
4        tokyo   6    c   3
5      seattle  22    f  10

Just to explain a little when we merge we perform a left merge and this will produce a clash of columns as we have column names that clash:

In [24]:

df1.merge(df, left_on='A', right_on='C',how='left')
Out[24]:
       A_x  B_x C_x  D  A_y  B_y      C_y
0   dublin   34   x  x  NaN  NaN      NaN
1    seoul   36   x  x  NaN  NaN      NaN
2   london   12   x  x    g    2   london
3   berlin    4   x  x    d    1   berlin
4    tokyo    6   x  x    c    3    tokyo
5  seattle   22   x  x    f   10  seattle

We are interested in just columns A_y and B_y, we perform a left merge because if we did the default merge type which is 'inner' then this will not align with the original df, i.e. the values will be shifted to the top of the df.

Upvotes: 1

Related Questions