How can I explode a nested dictionary into a dataframe?

I have a nested dictionary as below. I'm trying to convert the below to a dataframe with the columns iid, Invnum, @type, execId, CId, AId, df, type. What’s the best way to go about it?

data = {'A': {'B1': {'iid': 'B1', 'Invnum': {'B11': {'@type': '/test_data', 'execId': 42, 'CId': 42, 'AId': 'BAZ'}, 'B12': {'@type': '/test_data', 'CId': 8, 'AId': '123'}}}}, 'B2': {'iid': 'B2', 'Invnum': {'B21': {'@type': '/test_data', 'execId': 215, 'CId': 253,'df': [], 'type': 'F'}, 'B22': {'@type': '/test_data', 'execId': 10,'df': [], 'type': 'F'}}}}


for key1 in data['A'].keys():
    for key2 in data['A'][key1]['Invnum']:
        print(key1,key2)

Expected output:

Upvotes: 0

Answers (1)

Thymen

Reputation: 2159

As indicated in the comments, your input data is very obscure. This provides a lot of trouble for us, because we don't know what we can assume or not. For my solution I will assume at least the following, based on the example you provide:

In the dictionary there is an entry containing the iid and Invnum as keys in the same level.
The Invnum key is the only key, which has multiple values, or in otherwords is iterable (besides df), and on iteration it must hold the last dictionary. In otherwords, after the Invnum value (e.g. B11), you can only get the last dict with the other fields as keys (@type, execId, CId, AId, df, type), if they exists.
If there is a df value, it will hold a list.

# This is a place holder dictionary, so we can create entries that have the same pattern.
entry = {'@type': '', 'execId': '', 'CId': '', 'AId': '', 'df': '', 'type': ''}

# This will hold all the (properly) format entries for the df.
items = []

def flatten(data):
    if isinstance(data, dict):
        match data:
            # We are searching for a level that contains both an `iid` and `Invnum` key.
            case {'iid': id, 'Invnum': remainder}:
                for each in remainder:
                    entry_row = dict(**entry, iid=id, Invnum=each)
                    entry_row.update(remainder[each])
                    items.append(entry_row)
            case _:
                for key, value in data.items():
                    flatten(value)

# We flatten the data, such that the `items` variable will hold consistent entries
flatten(data)

# Transfer to pandas dataframe, and reorder the values for easy comparison.
df = pd.DataFrame(items)
df = df[['iid', 'Invnum', '@type', 'execId', 'CId', 'AId', 'df', 'type']]
print(df.to_string(index=False))

Output:

iid Invnum      @type execId CId AId df type
 B1    B11 /test_data     42  42 BAZ        
 B1    B12 /test_data          8 123        
 B2    B21 /test_data    215 253     []    F
 B2    B22 /test_data     10         []    F

Note:

All entries have been turned into strings, since I am using '' for empty values.
I heavily rely on the above made assumptions, in case they are incorrect, the answer will not match your expectation.
I am using Structural pattern matching, which is introduced in python 3.10.

Upvotes: 1

How can I explode a nested dictionary into a dataframe?

Answers (1)

Related Questions