Pandas dataframe into nested child dictionary

Question

I have a dataframe like below, where each 'level' drills down into more detail, with the last level having an id value.

data = [
    {'id': 1, 'level_1': 'Animals', 'level_2': 'Carnivores', 'level_3': 'Felidae', 'level_4', 'Siamese Cat'},
    {'id': 2, 'level_1': 'Animals', 'level_2': 'Carnivores', 'level_3': 'Felidae', 'level_4', 'Javanese Cat'},
    {'id': 3, 'level_1': 'Animals', 'level_2': 'Carnivores', 'level_3': 'Ursidae', 'level_4', 'Polar Bear'},
    {'id': 4, 'level_1': 'Animals', 'level_2': 'Carnivores', 'level_3': 'Canidae', 'level_4', 'Labradore Retriever'},
    {'id': 5, 'level_1': 'Animals', 'level_2': 'Carnivores', 'level_3': 'Canidae', 'level_4', 'Golden Retriever'}
]

I want to turn this into a nested dictionary of parent / child relationships like below.

var data = {
  "name": "Animals",
  "children": [
    {
      "name": "Carnivores",
      "children": [
        {
          "name": "Felidae",
          "children": [
            {
              "id": 1,
              "name": "Siamese Cat",
              "children": []
            },
            {
              "id": 2,
              "name": "Javanese Cat",
              "children": []
            }
          ]
        },
        {
          "name": "Ursidae",
          "children": [
            {
              "id": 3,
              "name": "Polar Bear",
              "children": []
            }
          ]
        },
        {
          "name": "Canidae",
          "children": [
            {
              "id": 4,
              "name": "Labradore Retriever",
              "children": []
            },
            {
              "id": 5,
              "name": "Golden Retriever",
              "children": []
            }
          ]
        }
      ]
    }
  ]
}

I've tried several approaches of grouping the dataframe and also looping over individual rows, but haven't been able to find a working solution yet. Any help would be greatly appreciated!

Timus · Accepted Answer

EDIT: Had to make an adjustment, because the result wasn't exactly as expected.

Here's an attempt that produces the expected output (if I haven't made a mistake, which wouldn't be a surprise, because I've made several on the way):

def pack_level(df):
    if df.columns[0] == 'id':
        return [{'id': i, 'name': name, 'children': []}
                for i, name in zip(df[df.columns[0]], df[df.columns[1]])]
    return [{'name': df.iloc[0, 0],
             'children': [entry for lst in df[df.columns[1]]
                                for entry in lst]}]

df = pd.DataFrame(data)
columns = list(df.columns[1:])
df = df.groupby(columns[:-1]).apply(pack_level)
for i in range(1, len(columns) - 1):
    df = (df.reset_index(level=-1, drop=False).groupby(columns[:-i])
                                              .apply(pack_level)
                                              .reset_index(level=-1, drop=True))

var_data = {'name': df.index[0], 'children': df.iloc[0]}

The result looks a bit different at first glance, but that should be only due to the sorting (from printing):

{
    "children": [
        {
            "children": [
                {
                    "children": [
                        {
                            "children": [],
                            "id": 4,
                            "name": "Labradore Retriever"
                        },
                        {
                            "children": [],
                            "id": 5,
                            "name": "Golden Retriever"
                        }
                    ],
                    "name": "Canidae"
                },
                {
                    "children": [
                        {
                            "children": [],
                            "id": 1,
                            "name": "Siamese Cat"
                        },
                        {
                            "children": [],
                            "id": 2,
                            "name": "Javanese Cat"
                        }
                    ],
                    "name": "Felidae"
                },
                {
                    "children": [
                        {
                            "children": [],
                            "id": 3,
                            "name": "Polar Bear"
                        }
                    ],
                    "name": "Ursidae"
                }
            ],
            "name": "Carnivores"
        }
    ],
    "name": "Animals"
}

I've tried to be as generic as possible, but the first column has to be named id (as in your sample).

Pandas dataframe into nested child dictionary

Answers (2)

Related Questions