Reputation: 2978
I have the following JSON file:
[
{'docType': 'custom',
'fields':
{
'general_info': None,
'power': 20,
'safety':
{
'boundingBox': [2.375,9.9,4.98,9.9,4.98,10.245,2.375,10.245],
'confidence': 0.69,
'page': 22,
'text': 'bla-bla-bla',
'type': 'string',
'valueString': 'bla-bla-bla'
},
'replacement':
{
'boundingBox': [2.505,2.51,2.54,2.51,2.54,3.425,2.505,3.425],
'confidence': 0.262,
'page': 7,
'text': 'bla-bla-bla',
'type': 'string',
'valueString': 'bla-bla-bla'
},
'document_id': 'x123'
}
}
]
I want to go through all field
values and extract text
from nested fields. The expected results is as the follows:
{
'labels':
{
'general_info': None,
'power': 20,
'safety': 'bla-bla-bla',
'replacement': 'bla-bla-bla',
'document_id': 'x123'
}
}
How can I flatted my JSON file and get an expected result?
This is what I have tried so far:
import json
json_object = json.load(raw_json)
fields = {}
for field in json_object:
for attribute, value in field.items():
fields[attribute] = value
fields_json = json.dumps(fields, indent = 4)
However, I don't know how to recursively enter into nested fields
Upvotes: 2
Views: 156
Reputation: 174
you should use recursion to walk through dictionary. My solution would be:
import json
with open('raw_json', 'r') as j:
d = json.load(j)
# print(d)
def dict_walker(obj ,key=None):
if isinstance(obj, dict):
for key in obj:
dict_walker(obj[key], key)
else:
print(key, ':', obj)
dict_walker(d)
OUT:
docType : custom
general_info : None
power : 20
boundingBox : [2.375, 9.9, 4.98, 9.9, 4.98, 10.245, 2.375, 10.245]
confidence : 0.69
page : 22
text : bla-bla-bla
type : string
valueString : bla-bla-bla
boundingBox : [2.505, 2.51, 2.54, 2.51, 2.54, 3.425, 2.505, 3.425]
confidence : 0.262
page : 7
text : bla-bla-bla
type : string
valueString : bla-bla-bla
document_id : x123
Upvotes: 1
Reputation: 4101
After load it as python list just loop over it to get inside dict
key called fields
and simply loop on its keys
and values
once you found value
whose type is dict
you have to loop on it to and get the inside value whose key
is text
then get value only and the key
be parent key
from pprint import pprint
res = {}
for sub in content:
for x, y in sub['fields'].items():
if isinstance(y, dict):
for i, e in y.items():
if i == 'text':
res[x] = e
else:
res[x] = y
final = {}
final['label'] = res
pprint(final)
{'label': {'document_id': 'x123',
'general_info': None,
'power': 20,
'replacement': 'bla-bla-bla',
'safety': 'bla-bla-bla'}}
Upvotes: 1
Reputation: 36691
You can write a recursive function. It should call itself when a value is a dictionary.
This is an example.
def flatten_fields(d):
out = {}
for k, v in d.items():
if isinstance(v, dict):
out[k] = flatten_fields(v)
elif k == 'text':
return v
elif isinstance(v, list):
continue
else:
out[k] = v
return out
To run it, you can iterate through each dictionary in the json_object
. You only have one example above, but this is the how:
labels = []
for d in json_object:
labels.append({'labels': flatten_fields(d.get('fields', {}))})
labels
# returns:
[{'labels': {'general_info': None,
'power': 20,
'safety': 'bla-bla-bla',
'replacement': 'bla-bla-bla',
'document_id': 'x123'}}]
Upvotes: 1