Robert Price
Robert Price

Reputation: 115

Pandas dataframe into nested child dictionary

I have a dataframe like below, where each 'level' drills down into more detail, with the last level having an id value.

data = [
    {'id': 1, 'level_1': 'Animals', 'level_2': 'Carnivores', 'level_3': 'Felidae', 'level_4', 'Siamese Cat'},
    {'id': 2, 'level_1': 'Animals', 'level_2': 'Carnivores', 'level_3': 'Felidae', 'level_4', 'Javanese Cat'},
    {'id': 3, 'level_1': 'Animals', 'level_2': 'Carnivores', 'level_3': 'Ursidae', 'level_4', 'Polar Bear'},
    {'id': 4, 'level_1': 'Animals', 'level_2': 'Carnivores', 'level_3': 'Canidae', 'level_4', 'Labradore Retriever'},
    {'id': 5, 'level_1': 'Animals', 'level_2': 'Carnivores', 'level_3': 'Canidae', 'level_4', 'Golden Retriever'}
]

I want to turn this into a nested dictionary of parent / child relationships like below.

var data = {
  "name": "Animals",
  "children": [
    {
      "name": "Carnivores",
      "children": [
        {
          "name": "Felidae",
          "children": [
            {
              "id": 1,
              "name": "Siamese Cat",
              "children": []
            },
            {
              "id": 2,
              "name": "Javanese Cat",
              "children": []
            }
          ]
        },
        {
          "name": "Ursidae",
          "children": [
            {
              "id": 3,
              "name": "Polar Bear",
              "children": []
            }
          ]
        },
        {
          "name": "Canidae",
          "children": [
            {
              "id": 4,
              "name": "Labradore Retriever",
              "children": []
            },
            {
              "id": 5,
              "name": "Golden Retriever",
              "children": []
            }
          ]
        }
      ]
    }
  ]
}

I've tried several approaches of grouping the dataframe and also looping over individual rows, but haven't been able to find a working solution yet. Any help would be greatly appreciated!

Upvotes: 0

Views: 582

Answers (2)

Timus
Timus

Reputation: 11321

EDIT: Had to make an adjustment, because the result wasn't exactly as expected.

Here's an attempt that produces the expected output (if I haven't made a mistake, which wouldn't be a surprise, because I've made several on the way):

def pack_level(df):
    if df.columns[0] == 'id':
        return [{'id': i, 'name': name, 'children': []}
                for i, name in zip(df[df.columns[0]], df[df.columns[1]])]
    return [{'name': df.iloc[0, 0],
             'children': [entry for lst in df[df.columns[1]]
                                for entry in lst]}]

df = pd.DataFrame(data)
columns = list(df.columns[1:])
df = df.groupby(columns[:-1]).apply(pack_level)
for i in range(1, len(columns) - 1):
    df = (df.reset_index(level=-1, drop=False).groupby(columns[:-i])
                                              .apply(pack_level)
                                              .reset_index(level=-1, drop=True))

var_data = {'name': df.index[0], 'children': df.iloc[0]}

The result looks a bit different at first glance, but that should be only due to the sorting (from printing):

{
    "children": [
        {
            "children": [
                {
                    "children": [
                        {
                            "children": [],
                            "id": 4,
                            "name": "Labradore Retriever"
                        },
                        {
                            "children": [],
                            "id": 5,
                            "name": "Golden Retriever"
                        }
                    ],
                    "name": "Canidae"
                },
                {
                    "children": [
                        {
                            "children": [],
                            "id": 1,
                            "name": "Siamese Cat"
                        },
                        {
                            "children": [],
                            "id": 2,
                            "name": "Javanese Cat"
                        }
                    ],
                    "name": "Felidae"
                },
                {
                    "children": [
                        {
                            "children": [],
                            "id": 3,
                            "name": "Polar Bear"
                        }
                    ],
                    "name": "Ursidae"
                }
            ],
            "name": "Carnivores"
        }
    ],
    "name": "Animals"
}

I've tried to be as generic as possible, but the first column has to be named id (as in your sample).

Upvotes: 2

Rik Kraan
Rik Kraan

Reputation: 586

The answer of @Timus mimics your intention, however you might encounter some difficulties searching this dictionary as each level has a key name and a key children. If this is what you intended ignore my answer. However, if you would like to create a dictionary in which you can more easily search through unique keys you can try:

df = df.set_index(['level_1', 'level_2', 'level_3', 'level_4'])

def make_dictionary(df):
    if df.index.nlevels == 1:
        return df.to_dict()

    dictionary = {}
    for key in df.index.get_level_values(0).unique():
        sub_df = df.xs(key)
        dictionary[key] = df_to_dict(sub_df)
    return dictionary

make_dictionary(df)

It requires setting the different levels as index, and you will end up with a slightly different dictionary:

{'Animals': 
    {'Carnivores': 
        {'Felidae': 
          {'id': {'Siamese Cat': 1,
                  'Javanese Cat': 2}},
         'Ursidae': 
          {'id': {'Polar Bear': 3}},
         'Canidae': 
          {'id': {'Labradore Retriever': 4, 
                  'Golden Retriever': 5}}}
    }
}

Upvotes: 3

Related Questions