Ashutosh Srivastava
Ashutosh Srivastava

Reputation: 95

multi level groupby and sum in pandas

I need to filter dataframe on multiple column filters, tried groupby but felt it is limited to 2 levels.

df_dic = {'col1': [1, 2, 3, 2, 1], 'year': ['2019', '2019', '2020', '2020', '2019'], 'week': ['37', '38', '1', '2', '37'], 'product': [1, 1, 1, 1, 1], 'se': [1, 0, 0, 0, 1], 'sqe': [0, 1, 0, 0, 1]}

Dataframe:

   col1  year week  product  se  sqe
      1  2019   37        1   1    0
      2  2019   38        1   0    1
      3  2020    1        1   0    0
      2  2020    2        1   0    0
      1  2019   37        1   1    1

Tried iterations: In my latest attempt, i was able to get week count per year, but i am expecting to get product sum, se sum, sqe sum week wise.

Expected Result:

{
 "2019": {
         "37":{
               "Product": 2,
               "SE": 2,
               "SQE":1 
               },
         "38":{
               "Product": 1,
               "SE": 0,
               "SQE":1 
               },
               },
   "2020": 
        {
          "1":{
               "Product": 2,
               "SE": 0,
               "SQE":0 
               }
               }
      }

Any help would be appreciated. BTW: these product, se and sqe can not be clubbed into one..

Upvotes: 1

Views: 862

Answers (2)

Valdi_Bo
Valdi_Bo

Reputation: 30991

To use my solution, grouping keys must be unique, so from your data sample I had to drop the last row, because year == 2019 and week == 37 occured earlier.

To get your expected result, you can run:

df.drop(columns='col1').set_index(['year', 'week']).groupby('year').apply(
    lambda grp: grp.reset_index(level=0, drop=True).to_dict(orient='index')).to_dict()

For your data sample (without the last row) I got:

{2019: {37: {'product': 1, 'se': 1, 'sqe': 0},
        38: {'product': 1, 'se': 0, 'sqe': 1}},
 2020: { 1: {'product': 1, 'se': 0, 'sqe': 0},
         2: {'product': 1, 'se': 0, 'sqe': 0}}}

It is possible to expand this code to more levels, but there must be specified all grouping levels you want.

Upvotes: 0

DavideBrex
DavideBrex

Reputation: 2414

Try:

df.groupby(by="year").apply(lambda grp: grp.groupby(by="week")[["product","se","sqe"]].sum().to_dict("index")).to_dict()

Output:

{'2019': 
       {'37': {'product': 2, 'se': 2, 'sqe': 1},
        '38': {'product': 1, 'se': 0, 'sqe': 1}},

 '2020': 
       {'1': {'product': 1, 'se': 0, 'sqe': 0},
        '2': {'product': 1, 'se': 0, 'sqe': 0}}}

Upvotes: 1

Related Questions