zesla
zesla

Reputation: 11793

Convert a three-level hierarchical data into a specific json format

I have a three-level hierarchical dataset, like below:

pd.DataFrame({'level1': ['A', 'A', 'A', 'A', 'B'],
               'level2': ['A1', 'A1', 'A2', 'A2', 'B1' ],
                'level3': ['a', 'b', 'c', 'd', 'e'],
                'value': [44, 125, 787, 99, 111],
                'pctChg': [0.3, -0.9, -10.0, 12, -0.2]})


  level1    level2  level3  value   pctChg
0   A       A1      a       44       0.3
1   A       A1      b       125     -0.9
2   A       A2      c       787     -10.0
3   A       A2      d       99      12.0
4   B       B1      e       111     -0.2

For a specific level1 category like A, there are level2 category like A1, A2. under levels two, there are many level3 categories. For example, 'a' and 'b' are under 'A1', 'c' and 'd' are under A2. This data is just an example. For each combination, there are value and percent info (percentage change from last month).

I need to tranform this data into a nested json data. It needs to be the format like below:

{
    name: “root”,
    value: 1166,
    pctChg: xx%,
    children: [
    {
        name: 'A',
        value: 956,
        pctChg: 'xx%'',
        children: [{
            name: 'A1',
            value: 169,
            pctChg: 'xx%'',
            children: [{name: 'a', value: 44, pctChg: '30%'},
                       {name:'b', value:125, pctChg: '-90%'},
                       {name:'c', value:787, pctChg: '-10%'}           
            ]
        },  .....]
    },
.....…
    ]
    }

We also need to aggregate the value for a level from all childrens at one level down. Value can be aggragated obviously. One tricky part is percent. We might not want to simply aggregate the percentage.

This looks like a pretty difficult task. Not like some simple nested json data. I'm not sure how I can approach that. Appreciate if anyone can help. Thanks a lot in advance.

Upvotes: 1

Views: 504

Answers (1)

Valdi_Bo
Valdi_Bo

Reputation: 30991

The first step is to reformat pctChg column to percentage string:

df.pctChg = (df.pctChg * 100).astype(int).astype(str) + '%'

(I assumed multiply by 100 formula).

Then define 2 functions computing children of the second and first level:

def chld2(grp):
    return grp.rename(columns={'level3': 'name'}).groupby('level2')\
        .apply(lambda grp: pd.Series({'name': grp.iloc[0,1], 'value': grp.value.sum(),
        'pctChg': 'xx%', 'children': grp[['name', 'value', 'pctChg']].to_dict('r') }))\
        .to_dict('r')

def chld1(df):
    return df.groupby('level1').apply(lambda grp: pd.Series({
        'name': grp.iloc[0,0], 'value': grp.value.sum(), 'pctChg': 'xx%',
        'children': chld2(grp)})).to_dict('r')

And to generate the result, run:

pd.Series({'name': 'root', 'value': df.value.sum(), 'pctChg': 'xx%',
    'children': chld1(df)}).to_json()

The result (with manually added indentation for readability) is:

{ "name":"root", "value":1166, "pctChg":"xx%",
  "children":[
    { "name":"A", "value":1055, "pctChg":"xx%",
      "children":[
        { "name":"A1", "value":169, "pctChg":"xx%",
          "children":[
            {"name":"a", "value":44, "pctChg":"30%"},
            {"name":"b", "value":125, "pctChg":"-90%"}
          ]
        },
        { "name":"A2", "value":886, "pctChg":"xx%",
          "children":[
            {"name":"c", "value":787, "pctChg":"-1000%"},
            {"name":"d", "value":99, "pctChg":"1200%"}
          ]
        }
      ]
    },
    { "name":"B", "value":111, "pctChg":"xx%",
      "children":[
        { "name":"B1", "value":111, "pctChg":"xx%",
          "children":[
            {"name":"e", "value":111, "pctChg":"-20%"}
          ]
        }
      ]
    }
  ]
}

Upvotes: 1

Related Questions