Reputation: 11
I have a very heavily nested json file with multiple blocks inside it. The following is an excerpt of the file, It has more than 6 levels of nesting like that
{
"title": "main questions",
"type": "static",
"value":
{
"title": "state your name",
"type": "QUESTION",
"locator": "namelocator",
}
}
If anyone can please help me to parse this in a way such that, i can find the title and locator when type = question(because the type may vary across different parts of the file) and that too concurrently(sequential would kill the system considering the scale of the file)
I have been using the following code to get the values of title and locator separately pip install jsonpath(in anaconda terminal)
from jsonpath import JSONPath
import json as js
data = js.load(f)# f is the path to .json file
JSONPath('$.[?(@.type== "QUESTION")].locator').parse(data)
JSONPath('$.[?(@.type== "QUESTION")].title').parse(data)
The problem is: I am getting the list of locators and title, but its all jumbled since there is no way to know the sequence the function parses the file in its been a while since I am stuck with this problem, and the only solution is going across the file to find all type==questions and then looping again to find the locators and titles(which is computationally not really feasible for a huge chunk of files)
Upvotes: 1
Views: 705
Reputation: 5871
The key is to parse once, and treat the objects you find as objects, so you group the correct title and locator together. They are easy to split if you need.
Here's a code sample demonstrating all the various answers I made in comments. I don't know what exact library you're using, but they all seem to implement the same JSONPath, so you can probably use this. Just change the function names and parameter order to fit whatever library you actually have.
from jsonpath import jsonpath
import json
text = """{
"title": "main questions",
"type": "static",
"value":
{
"title": "state your name",
"type": "QUESTION",
"locator": "namelocator"
}
}"""
# use jsonpath to find the question nodes
data = json.loads(text)
questions_parsed = jsonpath(obj=data, expr='$.[?(@.type== "QUESTION")]')
print (questions_parsed)
[{'title': 'state your name', 'type': 'QUESTION', 'locator': 'namelocator'}]
# python code to parse the same structure
def find_questions(data):
if isinstance(data, dict):
if 'type' in data and 'QUESTION' == data['type']:
# TODO: write a dataclass, or validate that it has title and locator
yield data
elif 'value' in data and isinstance(data['value'], dict):
value = data['value']
yield from find_questions(value)
elif isinstance(data, list):
for item in data:
yield from find_questions(item)
questions = [(question['title'], question['locator']) for question in find_questions(json.loads(text))]
Like I said, it's easy to split the one object into separate lists if you need them:
How to unzip a list of tuples into individual lists?
titles, locators = (list(t) for t in zip(*questions))
print(titles)
print(locators)
['state your name']
['namelocator']
I used this implementation:
pip show jsonpath
Name: jsonpath
Version: 0.82
Summary: An XPath for JSON
Home-page: http://www.ultimate.com/phil/python/#jsonpath
Author: Phil Budne
Author-email: [email protected]
License: MIT
Upvotes: 0