Rishabh Malhotra
Rishabh Malhotra

Reputation: 99

creating a json object from pandas dataframe

      Groups sub-groups selections
    0   sg1    csg1       sc1
    1   sg1    csg1       sc2
    2   sg1    csg2       sc3
    3   sg1    csg2       sc4
    4   sg2    csg3       sc5
    5   sg2    csg3       sc6
    6   sg2    csg4       sc7
    7   sg2    csg4       sc8

I have the dataframe mentioned above and I am trying to create a JSON object as follows:

{
  "sg1": {
    "csg1": ['sc1', 'sc2'],
    "csg2": ['sc3', 'sc4']
  },
  "sg2": {
    "csg3": ['sc5', 'sc6'],
    "csg4": ['sc7', 'sc8']
  }
}

I tried using the pandas to_json and to_dict with orient arguments but I am not getting the expected result. I also tried grouping by the columns and then creating the list and converting it into a JSON.

Any help is much appreciated.

Upvotes: 5

Views: 570

Answers (3)

Shubham Sharma
Shubham Sharma

Reputation: 71689

Lets try dictify function which builds a nested dictionary with top level keys from the Groups and corresponding sub level keys from sub-groups:

from collections import defaultdict

def dictify():
    dct = defaultdict(dict)
    for (x, y), g in df.groupby(['Groups', 'sub-groups']):
        dct[x][y] = [*g['selections']]
    return dict(dct)

# dictify()
{
    "sg1": {
        "csg1": ["sc1","sc2"],
        "csg2": ["sc3","sc4"]
    },
    "sg2": {
        "csg3": ["sc5","sc6"],
        "csg4": ["sc7","sc8"]
    }
}

Upvotes: 0

deponovo
deponovo

Reputation: 1432

You need to group on the columns of interest such as:

import pandas as pd

data = {
        'Groups': ['sg1', 'sg1', 'sg1', 'sg1', 'sg2', 'sg2', 'sg2', 'sg2'],
        'sub-groups': ['csg1', 'csg1', 'csg2', 'csg2', 'csg3', 'csg3', 'csg4', 'csg4'],
        'selections': ['sc1', 'sc2', 'sc3', 'sc4', 'sc5', 'sc6', 'sc7', 'sc8']
}

df = pd.DataFrame(data)
print(df.groupby(['Groups', 'sub-groups'])['selections'].unique().to_dict())

The output is:

{
    ('sg1', 'csg1'): array(['sc1', 'sc2'], dtype=object), 
    ('sg1', 'csg2'): array(['sc3', 'sc4'], dtype=object), 
    ('sg2', 'csg3'): array(['sc5', 'sc6'], dtype=object), 
    ('sg2', 'csg4'): array(['sc7', 'sc8'], dtype=object)
}

Upvotes: 0

yatu
yatu

Reputation: 88246

You can groupby ['Groups','sub-groups'] and build a dictionary from the multiindex series with a dictionary comprehension:

s = df.groupby(['Groups','sub-groups']).selections.agg(list)
d = {k1:{k2:v} for (k1,k2),v in s.iteritems()}

print(d)
# {'sg1': {'csg2': ['sc3', 'sc4']}, 'sg2': {'csg4': ['sc7', 'sc8']}}

Upvotes: 5

Related Questions