Reputation: 115
I have a dataframe like below, where each 'level' drills down into more detail, with the last level having an id value.
data = [
{'id': 1, 'level_1': 'Animals', 'level_2': 'Carnivores', 'level_3': 'Felidae', 'level_4', 'Siamese Cat'},
{'id': 2, 'level_1': 'Animals', 'level_2': 'Carnivores', 'level_3': 'Felidae', 'level_4', 'Javanese Cat'},
{'id': 3, 'level_1': 'Animals', 'level_2': 'Carnivores', 'level_3': 'Ursidae', 'level_4', 'Polar Bear'},
{'id': 4, 'level_1': 'Animals', 'level_2': 'Carnivores', 'level_3': 'Canidae', 'level_4', 'Labradore Retriever'},
{'id': 5, 'level_1': 'Animals', 'level_2': 'Carnivores', 'level_3': 'Canidae', 'level_4', 'Golden Retriever'}
]
I want to turn this into a nested dictionary of parent / child relationships like below.
var data = {
"name": "Animals",
"children": [
{
"name": "Carnivores",
"children": [
{
"name": "Felidae",
"children": [
{
"id": 1,
"name": "Siamese Cat",
"children": []
},
{
"id": 2,
"name": "Javanese Cat",
"children": []
}
]
},
{
"name": "Ursidae",
"children": [
{
"id": 3,
"name": "Polar Bear",
"children": []
}
]
},
{
"name": "Canidae",
"children": [
{
"id": 4,
"name": "Labradore Retriever",
"children": []
},
{
"id": 5,
"name": "Golden Retriever",
"children": []
}
]
}
]
}
]
}
I've tried several approaches of grouping the dataframe and also looping over individual rows, but haven't been able to find a working solution yet. Any help would be greatly appreciated!
Upvotes: 0
Views: 582
Reputation: 11321
EDIT: Had to make an adjustment, because the result wasn't exactly as expected.
Here's an attempt that produces the expected output (if I haven't made a mistake, which wouldn't be a surprise, because I've made several on the way):
def pack_level(df):
if df.columns[0] == 'id':
return [{'id': i, 'name': name, 'children': []}
for i, name in zip(df[df.columns[0]], df[df.columns[1]])]
return [{'name': df.iloc[0, 0],
'children': [entry for lst in df[df.columns[1]]
for entry in lst]}]
df = pd.DataFrame(data)
columns = list(df.columns[1:])
df = df.groupby(columns[:-1]).apply(pack_level)
for i in range(1, len(columns) - 1):
df = (df.reset_index(level=-1, drop=False).groupby(columns[:-i])
.apply(pack_level)
.reset_index(level=-1, drop=True))
var_data = {'name': df.index[0], 'children': df.iloc[0]}
The result looks a bit different at first glance, but that should be only due to the sorting (from printing):
{
"children": [
{
"children": [
{
"children": [
{
"children": [],
"id": 4,
"name": "Labradore Retriever"
},
{
"children": [],
"id": 5,
"name": "Golden Retriever"
}
],
"name": "Canidae"
},
{
"children": [
{
"children": [],
"id": 1,
"name": "Siamese Cat"
},
{
"children": [],
"id": 2,
"name": "Javanese Cat"
}
],
"name": "Felidae"
},
{
"children": [
{
"children": [],
"id": 3,
"name": "Polar Bear"
}
],
"name": "Ursidae"
}
],
"name": "Carnivores"
}
],
"name": "Animals"
}
I've tried to be as generic as possible, but the first column has to be named id
(as in your sample).
Upvotes: 2
Reputation: 586
The answer of @Timus mimics your intention, however you might encounter some difficulties searching this dictionary as each level has a key name
and a key children
. If this is what you intended ignore my answer. However, if you would like to create a dictionary in which you can more easily search through unique keys you can try:
df = df.set_index(['level_1', 'level_2', 'level_3', 'level_4'])
def make_dictionary(df):
if df.index.nlevels == 1:
return df.to_dict()
dictionary = {}
for key in df.index.get_level_values(0).unique():
sub_df = df.xs(key)
dictionary[key] = df_to_dict(sub_df)
return dictionary
make_dictionary(df)
It requires setting the different levels as index, and you will end up with a slightly different dictionary:
{'Animals':
{'Carnivores':
{'Felidae':
{'id': {'Siamese Cat': 1,
'Javanese Cat': 2}},
'Ursidae':
{'id': {'Polar Bear': 3}},
'Canidae':
{'id': {'Labradore Retriever': 4,
'Golden Retriever': 5}}}
}
}
Upvotes: 3