Reputation: 4408
I have a dataset, df, where I wish to sum and groupby the type along with the date:
date size type
1/1/2020 1 a
1/1/2020 1 a
1/1/2020 1 a
1/1/2020 2 b
1/1/2020 5 b
1/1/2020 6 b
1/1/2020 1 c
2/1/2020 20 a
2/1/2020 21 a
2/1/2020 10 a
2/1/2019 1 b
2/1/2019 4 b
2/1/2019 5 b
Desired output
(grouping by type and date to find sum)
date size type
1/1/2020 3 a
1/1/2020 13 b
1/1/2020 1 c
2/1/2020 51 a
2/1/2019 10 b
This is what I am doing:
a.groupby(['type','date']).sum()
However, the output is not the desired one, as the type is not showing throughout the dataframe. This is what I am getting:
Any suggestion is appreciated.
The problem I am having is with :
date size type
1/1/2020 1 c
since there is only the one value present.
Upvotes: 1
Views: 252
Reputation: 150785
When you do:
a.groupby(['type','date']).sum()
you get a new data frame with MultiIndex: type
and Date
. That's how Pandas decides to display the dataframe: repeated lower level index is omitted. The second line still have type == 'a'
.
To match your expected output, i.e. make type
and Date
as usual columns with all values, you can chain the above with .reset_index()
or use:
a.groupby(['type','date'], as_index=False).sum()
Upvotes: 2