ds_student
ds_student

Reputation: 183

Groupby a column containing duplicates but also preserving the duplicate information

I have the following dataframe:

 df=pd.DataFrame({'id':['A','A','B','C','D'],'Name':['apple','apricot','banana','orange','citrus'], 'count':[2,3,6,5,12]})

   id     Name  count
 0  A    apple      2
 1  A  apricot      3
 2  B   banana      6
 3  C   orange      5
 4  D   citrus     12

I am trying to group the dataframe by the 'id' column, but also preserve the duplicated names as separate columns. Below is the expected output:

    id   sum(count)  id1      id2
 0  A    5           apple    apricot
 1  B    6           banana   na
 2  C    5           orange   na
 3  D    12          citrus   na

I tried grouping by the id column using the following statement but that removes the name column completely.

df.groupby(['id'], as_index=False).sum() 

I would appreciate any suggestions/ help.

Upvotes: 2

Views: 40

Answers (1)

yatu
yatu

Reputation: 88236

You can use DataFrame.pivot_table for this:

g = df.groupby('id')
# Generate the new columns of the pivoted dataframe
col = g.Name.cumcount()
# Sum of count grouped by id
sum_count = g['count'].sum()

(df.pivot_table(values='Name', index='id', columns = col, aggfunc='first')
   .add_prefix('id')
   .assign(sum_count = sum_count))

      id0     id1       sum_count
id                            
A    apple  apricot          5
B   banana      NaN          6
C   orange      NaN          5
D   citrus      NaN         12

Upvotes: 2

Related Questions