Kevin
Kevin

Reputation: 83

Convert result from groupby on multiple columns to list of dictionaries of dictionaries

I am trying to convert result from pandas groupby function to a array of dictionaries of dictionaries.

Code run:

df = df.groupby(['DE', 'SW', 'Tech'])['COUNT(*)'].sum()

Output:

DE          SW  Tech  
2021-01-01  s1  t1          1
2021-04-02  s1  t1        699
                t2        268
2021-04-06  s1  t1          2
            s2  t1        466
                t2         81
                t3        954
Name: COUNT(*), Length: 474, dtype: int64

Expected output:

[
  {'2021-01-01': {'s1':{'t1': 1}} },
  {'2021-04-02': {'s1':{'t1': 699, 't2': 268}} },
  {'2021-04-06': {'s1':{'t1': 2}, 's2':{'t1': 466,'t2': 81,'t3': 954}} },
]

I tried using reset_index and then set_index('DE') but that throws an error due to non unique index. Using multiple values in set_index doesn't give the required output either.

Note: the columns used in df.groupby can also increase or decrease.

Please advice on how to achieve the expected output.

Upvotes: 1

Views: 223

Answers (1)

Shubham Sharma
Shubham Sharma

Reputation: 71689

Let us try defining a function dictify that takes the input argument as the multiindex pandas series and return the nested dictionary in the required format

def dictify(s):
    if s.index.nlevels == 1: return s.to_dict()
    return {k: dictify(g.droplevel(0)) for k, g in s.groupby(level=0)}

records = [{k: v} for k, v in dictify(df).items()]

>>> df

DE          SW  Tech  
2021-01-01  s1  t1          1
2021-04-02  s1  t1        699
                t2        268
2021-04-06  s1  t1          2
            s2  t1        466
                t2         81
                t3        954
Name: COUNT(*), Length: 474, dtype: int64

>>> records

[{'2021-01-01': {'s1': {'t1': 1}}},
 {'2021-04-02': {'s1': {'t1': 699, 't2': 268}}},
 {'2021-04-06': {'s1': {'t1': 2}, 's2': {'t1': 466, 't2': 81, 't3': 954}}}]

Upvotes: 1

Related Questions