Delete duplicates but keep the data of one column in a new column

Question

I have a dataframe with 2 columns

Column1 Column2
A        1
B        1
A        2
B        2

I want to delete duplicates of column 1 but keeping the values of column 2 of the deleted rows in a new column

Column1 Column2 Column3
A        1       2
B        1       2

Anurag Dabas · Accepted Answer

use groupby()+cumcount() to track position and then pivot():

df=(df.assign(key=df.groupby('Column1').cumcount())
      .pivot('Column1','key','Column2')
      .rename(columns=lambda x:f"Column{x+2}")
      .rename_axis(columns=None).reset_index())

OR in 3 steps:

df['key']=df.groupby('Column1').cumcount()+2
df=df.pivot('Column1','key','Column2').add_prefix('Column')
df=df.rename_axis(columns=None).reset_index()

output of df:

    Column1     Column2     Column3
0   A           1           2
1   B           1           2

Delete duplicates but keep the data of one column in a new column

Answers (2)

Related Questions