Manolo Dominguez Becerra
Manolo Dominguez Becerra

Reputation: 1363

Delete duplicates but keep the data of one column in a new column

I have a dataframe with 2 columns

Column1 Column2
A        1
B        1
A        2
B        2

I want to delete duplicates of column 1 but keeping the values of column 2 of the deleted rows in a new column

Column1 Column2 Column3
A        1       2
B        1       2

Upvotes: 0

Views: 54

Answers (2)

Mike
Mike

Reputation: 367

This should also work.

    Column1 Column2
   0    A   1
   1    B   1
   2    A   2
   3    B   2

# Pivot data to go from long to wide
pivoted_df = pd.pivot(df,index='Column1', columns='Column2',values='Column2')

Column2 1   2
Column1     
   A    1   2
   B    1   2

# Reset index to eliminate hierarchical index
pivoted_df.reset_index(inplace=True)

Column2 Column1 1   2
   0       A    1   2
   1       B    1   2

# Rename columns
pivoted_df.rename(columns={1:'Column2', 2: 'Column3'},inplace=True)

# Change DataFrame column name to (blank)
pivoted_df.columns.name = ''

Column1 Column2 Column3
0   A      1    2
1   B      1    2

Upvotes: 0

Anurag Dabas
Anurag Dabas

Reputation: 24324

use groupby()+cumcount() to track position and then pivot():

df=(df.assign(key=df.groupby('Column1').cumcount())
      .pivot('Column1','key','Column2')
      .rename(columns=lambda x:f"Column{x+2}")
      .rename_axis(columns=None).reset_index())

OR in 3 steps:

df['key']=df.groupby('Column1').cumcount()+2
df=df.pivot('Column1','key','Column2').add_prefix('Column')
df=df.rename_axis(columns=None).reset_index()

output of df:

    Column1     Column2     Column3
0   A           1           2
1   B           1           2

Upvotes: 1

Related Questions