Forscher
Forscher

Reputation: 83

flatten doubly nested dictionary inside list using json_normalize

response[0]={'@type': 'g:Vertex',
             '@value': {'id': 'account-2199023262994',
                        'label': 'Account',
                        'properties': {'account_number': [{'@type': 'g:VertexProperty',
                                                           '@value': {'id': {'@type': 'g:Int32',
                                                                             '@value': 544016139},
                                                                      'value': '0002-1990-2326-2994',
                                                                      'label': 'account_number'}}],
                                       'last_name': [{'@type': 'g:VertexProperty',
                                                      '@value': {'id': {'@type': 'g:Int32',
                                                                        '@value': -1616372909},
                                                                 'value': 'Law-Yone',
                                                                 'label': 'last_name'}}],
                                       'first_name': [{'@type': 'g:VertexProperty',
                                                       '@value': {'id': {'@type': 'g:Int32',
                                                                         '@value': -451458550},
                                                                  'value': 'Eric',
                                                                  'label': 'first_name'}}]}}}

I have a nested dictionary within the list which is a part of another nested dictionary. This whole above code snippet is itself an element of list response.

I tried to flatten this using:

pd.json_normalize(response, meta = ['@type', ['@value', 'id'], ['@value', 'label'], ['@value', 'properties']])

The output of above code is: enter image description here

I am expecting to flatten the internal lists of dictionaries also.

Can anyone help me out here? Do I need to iterate through the dataframe or are there any methods available directly?

EDIT: Expected Output as below enter image description here

Upvotes: 2

Views: 232

Answers (1)

Ajax1234
Ajax1234

Reputation: 71451

You can use a recursive generator function to get the full flattened result, row by row, and then use collections.defaultdict to group the rows on the account ID. From there, you can grab the desired keys to build your final pd.DataFrame:

import pandas as pd, collections
data = {'@type': 'g:Vertex', '@value': {'id': 'account-2199023262994', 'label': 'Account', 'properties': {'account_number': [{'@type': 'g:VertexProperty', '@value': {'id': {'@type': 'g:Int32', '@value': 544016139}, 'value': '0002-1990-2326-2994', 'label': 'account_number'}}], 'last_name': [{'@type': 'g:VertexProperty', '@value': {'id': {'@type': 'g:Int32', '@value': -1616372909}, 'value': 'Law-Yone', 'label': 'last_name'}}], 'first_name': [{'@type': 'g:VertexProperty', '@value': {'id': {'@type': 'g:Int32', '@value': -451458550}, 'value': 'Eric', 'label': 'first_name'}}]}}}
def flatten(d, c = [], p = []):
   t, f = [], []
   for a, b in d.items():
      (t if not isinstance(b, (dict, list)) else f).append((p+[a], b))
   if not f:
      yield {'.'.join(a):b for a, b in (c+t)}
   else:
      for a, b in f:
         if isinstance(b, dict):
             yield from flatten(b, c=c+t, p = a)
         else:
             for i in b:
                yield from flatten(i, c=c+t, p = a)

d = collections.defaultdict(dict)
for i in flatten(data):
   d[i['@value.id']].update(i)

kv = [['@type', '@type'], ['@value.id', '@value.id'], ['@value.label', '@value.label'], ['@[email protected].@value', 'account_number.id'], ['@[email protected]', 'account_number.value'], ['@[email protected].@value', 'last_name.id'], ['@[email protected]', 'last_name.value'], ['@[email protected].@value', 'first_name.id'], ['@[email protected]', 'first_name.value']]
df = pd.DataFrame([{j:b[k] for k, j in kv} for b in d.values()])

Output:

      @type              @value.id @value.label  ...  last_name.value first_name.id  first_name.value
0  g:Vertex  account-2199023262994      Account  ...         Law-Yone    -451458550              Eric

[1 rows x 9 columns]

Upvotes: 1

Related Questions