d4tm4x
d4tm4x

Reputation: 558

pandas.merge with coinciding column names

Consider the following data frames:

import pandas as pd
df1 = pd.DataFrame({'id': list('fghij'), 'A': ['A' + str(i) for i in range(5)]})
    A id
0  A0  f
1  A1  g
2  A2  h
3  A3  i
4  A4  j
df2 = pd.DataFrame({'id': list('fg'), 'B': ['B' + str(i) for i in range(2)]})
    B id
0  B0  f
1  B1  g
df3 = pd.DataFrame({'id': list('ij'), 'B': ['B' + str(i) for i in range(3, 5)]})
    B id
0  B3  i
1  B4  j

I want to merge them to get

    A id    B
0  A0  f   B0
1  A1  g   B1
2  A2  h  NaN
3  A3  i   B3
4  A4  j   B4

Inspired by this answer I tried

final = reduce(lambda l, r: pd.merge(l, r, how='outer', on='id'), [df1, df2, df3])

but unfortunately it yields

    A id  B_x  B_y
0  A0  f   B0  NaN
1  A1  g   B1  NaN
2  A2  h  NaN  NaN
3  A3  i  NaN   B3
4  A4  j  NaN   B4

Additionally, I checked out this question but I can't adapt the solution to my problem. Also, I didn't find any options in the docs for pandas.merge to make this happen.

In my real world problem the list of data frames might be much longer and the size of the data frames might be much larger.

Is there any "pythonic" way to do this directly and without "postprocessing"? It would be perfect to have a solution that raises an exception if column B of df2 and df3 would overlap (so if there might be multiple candidates for some value in column B of the final data frame).

Upvotes: 1

Views: 108

Answers (1)

cs95
cs95

Reputation: 402413

Consider pd.concat + groupby?

pd.concat([df1, df2, df3], axis=0).groupby('id').first().reset_index()

  id   A    B
0  f  A0   B0
1  g  A1   B1
2  h  A2  NaN
3  i  A3   B3
4  j  A4   B4

Upvotes: 1

Related Questions