Reputation: 83
I am trying to convert result from pandas groupby function to a array of dictionaries of dictionaries.
Code run:
df = df.groupby(['DE', 'SW', 'Tech'])['COUNT(*)'].sum()
Output:
DE SW Tech
2021-01-01 s1 t1 1
2021-04-02 s1 t1 699
t2 268
2021-04-06 s1 t1 2
s2 t1 466
t2 81
t3 954
Name: COUNT(*), Length: 474, dtype: int64
Expected output:
[
{'2021-01-01': {'s1':{'t1': 1}} },
{'2021-04-02': {'s1':{'t1': 699, 't2': 268}} },
{'2021-04-06': {'s1':{'t1': 2}, 's2':{'t1': 466,'t2': 81,'t3': 954}} },
]
I tried using reset_index
and then set_index('DE')
but that throws an error due to non unique index. Using multiple values in set_index
doesn't give the required output either.
Note: the columns used in df.groupby
can also increase or decrease.
Please advice on how to achieve the expected output.
Upvotes: 1
Views: 223
Reputation: 71689
Let us try defining a function dictify
that takes the input argument as the multiindex pandas series and return the nested dictionary in the required format
def dictify(s):
if s.index.nlevels == 1: return s.to_dict()
return {k: dictify(g.droplevel(0)) for k, g in s.groupby(level=0)}
records = [{k: v} for k, v in dictify(df).items()]
>>> df
DE SW Tech
2021-01-01 s1 t1 1
2021-04-02 s1 t1 699
t2 268
2021-04-06 s1 t1 2
s2 t1 466
t2 81
t3 954
Name: COUNT(*), Length: 474, dtype: int64
>>> records
[{'2021-01-01': {'s1': {'t1': 1}}},
{'2021-04-02': {'s1': {'t1': 699, 't2': 268}}},
{'2021-04-06': {'s1': {'t1': 2}, 's2': {'t1': 466, 't2': 81, 't3': 954}}}]
Upvotes: 1