Fluxy
Fluxy

Reputation: 2978

How to flatten the nested JSON file in order to retrieve an expected fields

I have the following JSON file:

[ 
  {'docType': 'custom',
   'fields': 
    {
      'general_info': None,
      'power': 20,
      'safety': 
       {
         'boundingBox': [2.375,9.9,4.98,9.9,4.98,10.245,2.375,10.245],
         'confidence': 0.69,
         'page': 22,
         'text': 'bla-bla-bla',
         'type': 'string',
         'valueString': 'bla-bla-bla'
       },
       'replacement': 
        {
          'boundingBox': [2.505,2.51,2.54,2.51,2.54,3.425,2.505,3.425],
          'confidence': 0.262,
          'page': 7,
          'text': 'bla-bla-bla',
          'type': 'string',
          'valueString': 'bla-bla-bla'
         },
        'document_id': 'x123'
     } 
   }
]

I want to go through all field values and extract text from nested fields. The expected results is as the follows:

{
   'labels': 
    {
       'general_info': None,
       'power': 20,
       'safety': 'bla-bla-bla',
       'replacement': 'bla-bla-bla',
       'document_id': 'x123'
     } 
}

How can I flatted my JSON file and get an expected result?

This is what I have tried so far:

import json

json_object = json.load(raw_json)

fields = {}
for field in json_object:
    for attribute, value in field.items():
        fields[attribute] = value

fields_json = json.dumps(fields, indent = 4)

However, I don't know how to recursively enter into nested fields

Upvotes: 2

Views: 156

Answers (3)

klv0000
klv0000

Reputation: 174

you should use recursion to walk through dictionary. My solution would be:

import json
with open('raw_json', 'r') as j:
    d = json.load(j)
    # print(d)

def dict_walker(obj ,key=None):
    if isinstance(obj, dict):
        for key in obj:
            dict_walker(obj[key], key)
    else:
        print(key, ':', obj)

dict_walker(d)

OUT:

docType : custom
general_info : None
power : 20
boundingBox : [2.375, 9.9, 4.98, 9.9, 4.98, 10.245, 2.375, 10.245]
confidence : 0.69
page : 22
text : bla-bla-bla
type : string
valueString : bla-bla-bla
boundingBox : [2.505, 2.51, 2.54, 2.51, 2.54, 3.425, 2.505, 3.425]
confidence : 0.262
page : 7
text : bla-bla-bla
type : string
valueString : bla-bla-bla
document_id : x123

Upvotes: 1

Umutambyi Gad
Umutambyi Gad

Reputation: 4101

After load it as python list just loop over it to get inside dict key called fields and simply loop on its keys and values once you found value whose type is dict you have to loop on it to and get the inside value whose key is text then get value only and the key be parent key

Example

from pprint import pprint

res = {}
for sub in content:
   for x, y in sub['fields'].items():
    if isinstance(y, dict):
        for i, e in y.items():
            if i == 'text':
                res[x] = e
    else:
        res[x] = y

final = {}
final['label'] = res
pprint(final)

output

{'label': {'document_id': 'x123',
           'general_info': None,
           'power': 20,
           'replacement': 'bla-bla-bla',
           'safety': 'bla-bla-bla'}}

Upvotes: 1

James
James

Reputation: 36691

You can write a recursive function. It should call itself when a value is a dictionary.

This is an example.

def flatten_fields(d):
    out = {}
    for k, v in d.items():
        if isinstance(v, dict):
            out[k] = flatten_fields(v)
        elif k == 'text':
            return v
        elif isinstance(v, list):
            continue
        else:
            out[k] = v
    return out

To run it, you can iterate through each dictionary in the json_object. You only have one example above, but this is the how:

labels = []
for d in json_object:
    labels.append({'labels': flatten_fields(d.get('fields', {}))})

labels
# returns:
[{'labels': {'general_info': None,
   'power': 20,
   'safety': 'bla-bla-bla',
   'replacement': 'bla-bla-bla',
   'document_id': 'x123'}}]

Upvotes: 1

Related Questions