Reputation: 183
I have the following dataframe:
df=pd.DataFrame({'id':['A','A','B','C','D'],'Name':['apple','apricot','banana','orange','citrus'], 'count':[2,3,6,5,12]})
id Name count
0 A apple 2
1 A apricot 3
2 B banana 6
3 C orange 5
4 D citrus 12
I am trying to group the dataframe by the 'id' column, but also preserve the duplicated names as separate columns. Below is the expected output:
id sum(count) id1 id2
0 A 5 apple apricot
1 B 6 banana na
2 C 5 orange na
3 D 12 citrus na
I tried grouping by the id column using the following statement but that removes the name column completely.
df.groupby(['id'], as_index=False).sum()
I would appreciate any suggestions/ help.
Upvotes: 2
Views: 40
Reputation: 88236
You can use DataFrame.pivot_table
for this:
g = df.groupby('id')
# Generate the new columns of the pivoted dataframe
col = g.Name.cumcount()
# Sum of count grouped by id
sum_count = g['count'].sum()
(df.pivot_table(values='Name', index='id', columns = col, aggfunc='first')
.add_prefix('id')
.assign(sum_count = sum_count))
id0 id1 sum_count
id
A apple apricot 5
B banana NaN 6
C orange NaN 5
D citrus NaN 12
Upvotes: 2