user14057357
user14057357

Reputation: 59

Merging two dataframes when none of the columns are an exact match

I have two panda large dataframes that I would like to merge using the second column. The issue is that the first dataframe is longer than the second dataframe, so there are some names that won't be found in the second column. I would like to merge the two dataframes such that the rows that can't be merged just stay as they are and the ones that can be, the column 1 values are listed in a comma separated list. I have two short examples of the type of dataframe I am working with as well as the desired result.

DF 1

col 1   col 2
1       Johnny   
7       Cathy
4       Becky 
2       Sarah 
33      Courtney
83      Avery
78      Adam

DF 2

col 1   col 2
12      Johnny   
3       Cathy
13      Becky 
15      Sarah 
55      Adam

Desired result:

col 1      col 2
1, 12      Johnny   
7, 3       Cathy
4, 13      Becky 
2, 15      Sarah 
33         Courtney
83         Avery
78, 55     Adam

Upvotes: 0

Views: 593

Answers (1)

AlexK
AlexK

Reputation: 3011

If you don't need that specific row order in the output dataframe, you can just do this:

import pandas as pd

df1 = pd.DataFrame(
    {'col1': [1,7,4,2,33,83,78],
    'col2': ['Johnny', 'Cathy', 'Becky', 'Sarah', 'Courtney', 'Avery', 'Adam']}
)
df2 = pd.DataFrame(
    {'col1': [12,3,13,15,55],
    'col2': ['Johnny', 'Cathy', 'Becky', 'Sarah', 'Adam']}
)

df = pd.concat([df1, df2], axis=0).groupby('col2').agg(list).reset_index()
print(df)

#   col2    col1
# 0 Adam    [78, 55]
# 1 Avery   [83]
# 2 Becky   [4, 13]
# 3 Cathy   [7, 3]
# 4 Courtney    [33]
# 5 Johnny  [1, 12]
# 6 Sarah   [2, 15]

Upvotes: 2

Related Questions