How to merge duplicates as new columns

Question

I am attempting to merge to dataframes on 1 column for which I would like the output of duplicates to be an extra column instead of a new row.

What happens now:

df1 = pd.DataFrame({'A': ['A0'],
                     'B': ['B0']})

df2 = pd.DataFrame({'A': ['A0', 'A0'],
                     'C': ['C4', 'C5']})

df1.merge(df2, on = 'A', how = 'left')

Gives the output:

    A   B   C
0  A0  B0  C4
1  A0  B0  C5

What I would like the output to be:

    A   B   C_1   C_2
0  A0  B0  C4    C5

Thanks!

jezrael · Accepted Answer

Create unique values of column A in df2 by MultiIndex by DataFrame.set_index with counter column by GroupBy.cumcount, reshape by Series.unstack and flatten Multiindex by map with join::

df2 = df2.set_index(['A', df2.groupby('A').cumcount().add(1).astype(str)]).unstack()
df2.columns = df2.columns.map('_'.join)
df2 = df2.reset_index()
print (df2)
    A C_1 C_2
0  A0  C4  C5

df = df1.merge(df2, on = 'A', how = 'left')
print (df)
    A   B C_1 C_2
0  A0  B0  C4  C5

How to merge duplicates as new columns

Answers (2)

Related Questions