Reputation: 558
Consider the following data frames:
import pandas as pd
df1 = pd.DataFrame({'id': list('fghij'), 'A': ['A' + str(i) for i in range(5)]})
A id
0 A0 f
1 A1 g
2 A2 h
3 A3 i
4 A4 j
df2 = pd.DataFrame({'id': list('fg'), 'B': ['B' + str(i) for i in range(2)]})
B id
0 B0 f
1 B1 g
df3 = pd.DataFrame({'id': list('ij'), 'B': ['B' + str(i) for i in range(3, 5)]})
B id
0 B3 i
1 B4 j
I want to merge them to get
A id B
0 A0 f B0
1 A1 g B1
2 A2 h NaN
3 A3 i B3
4 A4 j B4
Inspired by this answer I tried
final = reduce(lambda l, r: pd.merge(l, r, how='outer', on='id'), [df1, df2, df3])
but unfortunately it yields
A id B_x B_y
0 A0 f B0 NaN
1 A1 g B1 NaN
2 A2 h NaN NaN
3 A3 i NaN B3
4 A4 j NaN B4
Additionally, I checked out this question but I can't adapt the solution to my problem. Also, I didn't find any options in the docs for pandas.merge to make this happen.
In my real world problem the list of data frames might be much longer and the size of the data frames might be much larger.
Is there any "pythonic" way to do this directly and without "postprocessing"? It would be perfect to have a solution that raises an exception if column B of df2 and df3 would overlap (so if there might be multiple candidates for some value in column B of the final data frame).
Upvotes: 1
Views: 108
Reputation: 402413
Consider pd.concat
+ groupby
?
pd.concat([df1, df2, df3], axis=0).groupby('id').first().reset_index()
id A B
0 f A0 B0
1 g A1 B1
2 h A2 NaN
3 i A3 B3
4 j A4 B4
Upvotes: 1