Reputation: 35
Here is my dataframe:
Id | Category | Hours |
---|---|---|
1 | A | 1 |
1 | A | 3 |
1 | B | 4 |
2 | A | 2 |
2 | B | 6 |
3 | A | 3 |
And here is the output I want:
Id | Total Hours | A_Hours | B_Hours |
---|---|---|---|
1 | 5 | 4 | 4 |
2 | 8 | 2 | 6 |
3 | 3 | 3 | 0 |
How do I achieve this?
I tried various methods of grouping and aggregation, even calculating the A_hours series separately and appending it to the dataframe but I didn't find a way to both calculate the zeros ( based on absence of hours under a particular category) and maintain the order.
Upvotes: 1
Views: 97
Reputation: 260300
Use a pivot_table
:
out = (df.pivot_table(index='Id', columns='Category', values='Hours',
aggfunc='sum', fill_value=0,
margins=True, margins_name='Total')
.add_suffix('_Hours')
.drop('Total').reset_index().rename_axis(columns=None)
)
Output:
Id A_Hours B_Hours Total_Hours
0 1 4 4 8
1 2 2 6 8
2 3 3 0 3
Upvotes: 1
Reputation: 67
Complementing the answer given by @Michael Cao, you can perfom the .loc to create the "Total Hours" per row:
summary = df.groupby(['Id', 'Category'])[['Hours']].sum().reset_index().pivot(index = 'Id', columns = 'Category', values = 'Hours').fillna(0)
summary.loc[:,'Total Hours'] = summary.sum(axis=1)
Upvotes: 1
Reputation: 3609
Perform a groupby by both Id and Category to get the hours per category and then pivot.
df.groupby(['Id', 'Category'])[['Hours']].sum().reset_index().pivot(index = 'Id', columns = 'Category', values = 'Hours').fillna(0)
Upvotes: 1