Pandas combining keys when grouping by multiple column

Question

I have 3 levels of grouping based on 3 keys: key1, key2, key3 I want to get the sum of a column (c1) for the following combination:

key1, sum(c1)
key1, key2, sum(c1)
key1, key2, key3, sum(c1)

I am getting the sums in 3 different dfs. (sum_k1, sum_k1k2, sum_k1k2k3) I want to combine the dataframe and thereupon convert it to json as follows:

How do I go about this?

DocZer&#248; · Accepted Answer

I don't know if this is the most efficient way to go about it, but this is what I came up with

import pandas as pd
import random

# Prepare the sample dataset

table = []
for i in range(100000):
    row = {'key1': random.choice('ABC'),
           'key2': random.choice('KLM'),
           'key3': random.choice('XYZ'),
           'val' : random.randint(0,500)}
    table.append(row)

df = pd.DataFrame(table)

# Aggregate the first level

dict_agg = (df.groupby('key1')
            .sum()
            .rename(columns={'val':'sum'})
            .to_dict('index'))

# Convert from numpy.int64 to Python scalar
for idx, value in dict_agg.items():
    dict_agg[idx]['sum'] = int(dict_agg[idx]['sum'])

# Aggregate the second level

df_lvl2 = (df.groupby(['key1','key2'])
           .sum()
           .rename(columns={'val':'sum'})
           .to_dict('index'))

# Assign the second level aggregation

for idx, value in df_lvl2.items():
    dict_agg[idx[0]][idx[1]] = {'sum': int(value['sum'])}

# Aggregate the final level

df_lvl3 = (df.groupby(['key1','key2','key3'])
           .sum()
           .rename(columns={'val':'sum'})
           .to_dict('index'))

# Assign the third level aggregation

for idx, value in df_lvl3.items():
    dict_agg[idx[0]][idx[1]][idx[2]] = {'sum': int(value['sum'])}

The end result will look like this:

{'A': {'K': {'X': {'sum': 929178},
   'Y': {'sum': 940925},
   'Z': {'sum': 938008},
   'sum': 2808111},
  'L': {'X': {'sum': 902581},
   'Y': {'sum': 953821},
   'Z': {'sum': 942942},
   'sum': 2799344},
  'M': {'X': {'sum': 930117},
   'Y': {'sum': 929257},
   'Z': {'sum': 910905},
   'sum': 2770279},
  'sum': 8377734},
 'B': {'K': {'X': {'sum': 888818},
…

As this is a dict, you need to convert it to json, by doing:

import json
output = json.dumps(dict_agg)

Pandas combining keys when grouping by multiple column

Answers (2)

Related Questions