Reputation: 89
Hi I want to join 2 or more dataframes together based on a column lets say 'id' The column has similar and different IDs but I want to join/merge/concat/append them together so they are all in one big dataframe. Here is an example:
Df1:
id col1 col2
1
2
4
5
Df2:
id col3 col4
1
2
3
5
This is what I want:
Df3:
Id col1 col2 col3 col4
1
2
3
4
5
Upvotes: 3
Views: 201
Reputation: 109528
Assuming no columns overlap other than the id
column, you can merge them.
df1 = pd.DataFrame({'id': [1, 2, 4, 5], 'col1': list('ABCD'), 'col2': list('EFGH')})
df2 = pd.DataFrame({'id': [1, 2, 3, 5], 'col3': list('ABCD'), 'col4': list('EFGH')})
>>> df1.merge(df2, how='outer', on='id').set_index('id').sort_index()
col1 col2 col3 col4
id
1 A E A E
2 B F B F
3 NaN NaN C G
4 C G NaN NaN
5 D H D H
Note that concatenation does not work given your example:
>>> pd.concat([df1, df2], axis=1)
col1 col2 id col3 col4 id
0 A E 1 A E 1
1 B F 2 B F 2
2 C G 4 C G 3
3 D H 5 D H 5
You can merge the dataframes if you first set the index before using concat. Here is a general solution for multiple dataframes:
dfs = (df1, df2) # Add other dataframes as required.
>>> pd.concat([df.set_index('id') for df in dfs], axis=1)
col1 col2 col3 col4
id
1 A E A E
2 B F B F
3 NaN NaN C G
4 C G NaN NaN
5 D H D H
Note that if you have overlapping columns in your dataframe (e.g. col2
), you would end up with something like this using pd.concat
:
col1 col2 col2 col4
id
1 A E A E
2 B F B F
3 NaN NaN C G
4 C G NaN NaN
5 D H D H
Upvotes: 4