Reputation: 1859
This question is similar to this one, but I want to take it a step further. Is it possible to extend the solution to work with more levels? Multilevel dataframes' .to_dict()
method has some promising options, but most of them will return entries that are indexed by tuples (i.e. (A, 0, 0): 274.0
) rather than nesting them in dictionaries.
For an example of what I'm looking to accomplish, consider this multiindex dataframe:
data = {0: {
('A', 0, 0): 274.0,
('A', 0, 1): 19.0,
('A', 1, 0): 67.0,
('A', 1, 1): 12.0,
('B', 0, 0): 83.0,
('B', 0, 1): 45.0
},
1: {
('A', 0, 0): 254.0,
('A', 0, 1): 11.0,
('A', 1, 0): 58.0,
('A', 1, 1): 11.0,
('B', 0, 0): 76.0,
('B', 0, 1): 56.0
}
}
df = pd.DataFrame(data).T
df.index = ['entry1', 'entry2']
df
# output:
A B
0 1 0
0 1 0 1 0 1
entry1 274.0 19.0 67.0 12.0 83.0 45.0
entry2 254.0 11.0 58.0 11.0 76.0 56.0
You can imagine that we have many records here, not just two, and that the index names could be longer strings. How could you turn this into nested dictionaries (or directly to JSON) that look like this:
[
{'entry1': {'A': {0: {0: 274.0, 1: 19.0}, 1: {0: 67.0, 1: 12.0}},
'B': {0: {0: 83.0, 1: 45.0}}},
'entry2': {'A': {0: {0: 254.0, 1: 11.0}, 1: {0: 58.0, 1: 11.0}},
'B': {0: {0: 76.0, 1: 56.0}}}}
]
I'm thinking some amount of recursion could potentially be helpful, maybe something like this, but have so far been unsuccessful.
Upvotes: 12
Views: 7466
Reputation: 9
I took the idea from the previous answer and slightly modified it.
1) Took the function nested_dict from stackoverflow, to create the dictionary
from collections import defaultdict
def nested_dict(n, type):
if n == 1:
return defaultdict(type)
else:
return defaultdict(lambda: nested_dict(n-1, type))
2 Wrote the following function:
def df_to_nested_dict(self, df, type): # Get the number of levels temp = df.index.names lvl = len(temp) # Create the target dictionary new_nested_dict=nested_dict(lvl, type) # Convert the dataframe to a dictionary temp_dict = df.to_dict(orient='index') for x, y in temp_dict.items(): dict_keys = '' # Process the individual items from the key for item in x: dkey = '[%d]' % item dict_keys = dict_keys + dkey # Create a string and execute it dict_update = 'new_nested_dict%s = y' % dict_keys exec(dict_update) return new_nested_dict
It is the same idea but it is done slightly different
Upvotes: 0
Reputation: 40888
So, you really need to do 2 things here:
df.to_dict()
df.to_dict(orient='index')
gives you a dictionary with the index as keys; it looks like this:
>>> df.to_dict(orient='index')
{'entry1': {('A', 0, 0): 274.0,
('A', 0, 1): 19.0,
('A', 1, 0): 67.0,
('A', 1, 1): 12.0,
('B', 0, 0): 83.0,
('B', 0, 1): 45.0},
'entry2': {('A', 0, 0): 254.0,
('A', 0, 1): 11.0,
('A', 1, 0): 58.0,
('A', 1, 1): 11.0,
('B', 0, 0): 76.0,
('B', 0, 1): 56.0}}
Now you need to nest this. Here's a trick from Martijn Pieters to do that:
def nest(d: dict) -> dict:
result = {}
for key, value in d.items():
target = result
for k in key[:-1]: # traverse all keys but the last
target = target.setdefault(k, {})
target[key[-1]] = value
return result
Putting this all together:
def df_to_nested_dict(df: pd.DataFrame) -> dict:
d = df.to_dict(orient='index')
return {k: nest(v) for k, v in d.items()}
Output:
>>> df_to_nested_dict(df)
{'entry1': {'A': {0: {0: 274.0, 1: 19.0}, 1: {0: 67.0, 1: 12.0}},
'B': {0: {0: 83.0, 1: 45.0}}},
'entry2': {'A': {0: {0: 254.0, 1: 11.0}, 1: {0: 58.0, 1: 11.0}},
'B': {0: {0: 76.0, 1: 56.0}}}}
Upvotes: 17