Leokins
Leokins

Reputation: 89

Python Pandas How to join 2 or more dataframes based on a similar column

Hi I want to join 2 or more dataframes together based on a column lets say 'id' The column has similar and different IDs but I want to join/merge/concat/append them together so they are all in one big dataframe. Here is an example:

Df1:
id col1 col2
1  
2
4
5

Df2: 
id col3 col4
1
2
3
5

This is what I want:

Df3:
Id col1 col2 col3 col4
1
2
3
4
5

Upvotes: 3

Views: 201

Answers (1)

Alexander
Alexander

Reputation: 109528

Assuming no columns overlap other than the id column, you can merge them.

df1 = pd.DataFrame({'id': [1, 2, 4, 5], 'col1': list('ABCD'), 'col2': list('EFGH')})
df2 = pd.DataFrame({'id': [1, 2, 3, 5], 'col3': list('ABCD'), 'col4': list('EFGH')})

>>> df1.merge(df2, how='outer', on='id').set_index('id').sort_index()
   col1 col2 col3 col4
id                    
1     A    E    A    E
2     B    F    B    F
3   NaN  NaN    C    G
4     C    G  NaN  NaN
5     D    H    D    H

Note that concatenation does not work given your example:

>>> pd.concat([df1, df2], axis=1)
  col1 col2  id col3 col4  id
0    A    E   1    A    E   1
1    B    F   2    B    F   2
2    C    G   4    C    G   3
3    D    H   5    D    H   5

You can merge the dataframes if you first set the index before using concat. Here is a general solution for multiple dataframes:

dfs = (df1, df2)  # Add other dataframes as required.
>>> pd.concat([df.set_index('id') for df in dfs], axis=1)
   col1 col2 col3 col4
id                    
1     A    E    A    E
2     B    F    B    F
3   NaN  NaN    C    G
4     C    G  NaN  NaN
5     D    H    D    H

Note that if you have overlapping columns in your dataframe (e.g. col2), you would end up with something like this using pd.concat:

   col1 col2 col2 col4
id                    
1     A    E    A    E
2     B    F    B    F
3   NaN  NaN    C    G
4     C    G  NaN  NaN
5     D    H    D    H

Upvotes: 4

Related Questions