Lynn
Lynn

Reputation: 4408

Group by multiple columns with some having a single value (in Python)

I have a dataset, df, where I wish to sum and groupby the type along with the date:

 date        size       type

1/1/2020     1          a
1/1/2020     1          a
1/1/2020     1          a
1/1/2020     2          b
1/1/2020     5          b
1/1/2020     6          b
1/1/2020     1          c
2/1/2020     20         a
2/1/2020     21         a
2/1/2020     10         a
2/1/2019     1          b
2/1/2019     4          b     
2/1/2019     5          b

Desired output

(grouping by type and date to find sum)

  date      size                type
 1/1/2020   3                   a
 1/1/2020   13                  b
 1/1/2020   1                   c
 2/1/2020   51                  a
 2/1/2019   10                  b

This is what I am doing:

 a.groupby(['type','date']).sum() 

However, the output is not the desired one, as the type is not showing throughout the dataframe. This is what I am getting:

enter image description here

Any suggestion is appreciated.

The problem I am having is with :

date       size                type
1/1/2020   1                   c

since there is only the one value present.

Upvotes: 1

Views: 252

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150785

When you do:

 a.groupby(['type','date']).sum() 

you get a new data frame with MultiIndex: type and Date. That's how Pandas decides to display the dataframe: repeated lower level index is omitted. The second line still have type == 'a'.

To match your expected output, i.e. make type and Date as usual columns with all values, you can chain the above with .reset_index() or use:

a.groupby(['type','date'], as_index=False).sum() 

Upvotes: 2

Related Questions