Reputation: 279
I have a nested dictionary as below. I'm trying to convert the below to a dataframe with the columns iid, Invnum, @type, execId, CId, AId, df, type. What’s the best way to go about it?
data = {'A': {'B1': {'iid': 'B1', 'Invnum': {'B11': {'@type': '/test_data', 'execId': 42, 'CId': 42, 'AId': 'BAZ'}, 'B12': {'@type': '/test_data', 'CId': 8, 'AId': '123'}}}}, 'B2': {'iid': 'B2', 'Invnum': {'B21': {'@type': '/test_data', 'execId': 215, 'CId': 253,'df': [], 'type': 'F'}, 'B22': {'@type': '/test_data', 'execId': 10,'df': [], 'type': 'F'}}}}
for key1 in data['A'].keys():
for key2 in data['A'][key1]['Invnum']:
print(key1,key2)
Expected output:
Upvotes: 0
Views: 213
Reputation: 2159
As indicated in the comments, your input data is very obscure. This provides a lot of trouble for us, because we don't know what we can assume or not. For my solution I will assume at least the following, based on the example you provide:
iid
and Invnum
as keys in the same level.Invnum
key is the only key, which has multiple values, or in otherwords is iterable (besides df
), and on iteration it must hold the last dictionary. In otherwords, after the Invnum
value (e.g. B11
), you can only get the last dict with the other fields as keys (@type
, execId
, CId
, AId
, df
, type
), if they exists.df
value, it will hold a list
.# This is a place holder dictionary, so we can create entries that have the same pattern.
entry = {'@type': '', 'execId': '', 'CId': '', 'AId': '', 'df': '', 'type': ''}
# This will hold all the (properly) format entries for the df.
items = []
def flatten(data):
if isinstance(data, dict):
match data:
# We are searching for a level that contains both an `iid` and `Invnum` key.
case {'iid': id, 'Invnum': remainder}:
for each in remainder:
entry_row = dict(**entry, iid=id, Invnum=each)
entry_row.update(remainder[each])
items.append(entry_row)
case _:
for key, value in data.items():
flatten(value)
# We flatten the data, such that the `items` variable will hold consistent entries
flatten(data)
# Transfer to pandas dataframe, and reorder the values for easy comparison.
df = pd.DataFrame(items)
df = df[['iid', 'Invnum', '@type', 'execId', 'CId', 'AId', 'df', 'type']]
print(df.to_string(index=False))
Output:
iid Invnum @type execId CId AId df type
B1 B11 /test_data 42 42 BAZ
B1 B12 /test_data 8 123
B2 B21 /test_data 215 253 [] F
B2 B22 /test_data 10 [] F
Note:
''
for empty values.Structural pattern matching
, which is introduced in python 3.10.Upvotes: 1