Reputation: 1363
I have a dataframe with 2 columns
Column1 Column2
A 1
B 1
A 2
B 2
I want to delete duplicates of column 1 but keeping the values of column 2 of the deleted rows in a new column
Column1 Column2 Column3
A 1 2
B 1 2
Upvotes: 0
Views: 54
Reputation: 367
This should also work.
Column1 Column2
0 A 1
1 B 1
2 A 2
3 B 2
# Pivot data to go from long to wide
pivoted_df = pd.pivot(df,index='Column1', columns='Column2',values='Column2')
Column2 1 2
Column1
A 1 2
B 1 2
# Reset index to eliminate hierarchical index
pivoted_df.reset_index(inplace=True)
Column2 Column1 1 2
0 A 1 2
1 B 1 2
# Rename columns
pivoted_df.rename(columns={1:'Column2', 2: 'Column3'},inplace=True)
# Change DataFrame column name to (blank)
pivoted_df.columns.name = ''
Column1 Column2 Column3
0 A 1 2
1 B 1 2
Upvotes: 0
Reputation: 24324
use groupby()
+cumcount()
to track position and then pivot()
:
df=(df.assign(key=df.groupby('Column1').cumcount())
.pivot('Column1','key','Column2')
.rename(columns=lambda x:f"Column{x+2}")
.rename_axis(columns=None).reset_index())
OR in 3 steps:
df['key']=df.groupby('Column1').cumcount()+2
df=df.pivot('Column1','key','Column2').add_prefix('Column')
df=df.rename_axis(columns=None).reset_index()
output of df
:
Column1 Column2 Column3
0 A 1 2
1 B 1 2
Upvotes: 1