Reputation: 51
I have the following string:
...some random text...
{
"1":"one",
"2":"two",
"3":{
"31":{
"311":"threeoneone",
"312":"threeonetwo",
"313":"threeonethree"
}
},
"4":{
"41":"fourone",
"42":"fourtwo",
"43":"fourthree"
},
"5":"five",
"6":"six"
}
...some more random text...
How can I extract the JSON from this? This is what I want to get.
{
"1": "one",
"2": "two",
"3": {
"31": {
"311": "threeoneone",
"312": "threeonetwo",
"313": "threeonethree"
}
},
"4": {
"41": "fourone",
"42": "fourtwo",
"43": "fourthree"
},
"5": "five",
"6": "six"
}
Is there a Pythonic way of getting this done?
Upvotes: 3
Views: 1433
Reputation: 107085
A more robust solution to finding JSON objects in a file with mixed content without any assumption of the content (the non-JSON content may contain unpaired curly brackets, and the JSON content may contain strings that contain unpaired curly brackets, and there may be multiple JSON objects, etc.) would be to iteratively try parsing any substring starting with a curly bracket {
with the json.JSONDecoder.raw_decode
method, which allows extra data after a JSON document. Since this method takes a starting index as a second argument, which the regular decode
method does not have, we can provide this index in a function closure instead. And since this method also returns the index at which the valid JSON document ends, we can use the index as a starting index for finding the next substring starting with a {
:
import json
def RawJSONDecoder(index):
class _RawJSONDecoder(json.JSONDecoder):
end = None
def decode(self, s, *_):
data, self.__class__.end = self.raw_decode(s, index)
return data
return _RawJSONDecoder
def extract_json(s, index=0):
while (index := s.find('{', index)) != -1:
try:
yield json.loads(s, cls=(decoder := RawJSONDecoder(index)))
index = decoder.end
except json.JSONDecodeError:
index += 1
So that:
s = '''...some {{bad brackets} and empty brackets {} <= still valid JSON though...
{
"1":"one",
"2":"two",
"3":{
"31":{
"311":"threeoneone",
"312":"threeonetwo",
"313":"threeonethree"
}
},
"4":{
"41":"fourone",
"42":"fourtwo",
"43":"fourthree"
},
"5":"five",
"6":"six"
}
...some more random text...'''
print(*extract_json(s), sep='\n')
outputs:
{}
{'1': 'one', '2': 'two', '3': {'31': {'311': 'threeoneone', '312': 'threeonetwo', '313': 'threeonethree'}}, '4': {'41': 'fourone', '42': 'fourtwo', '43': 'fourthree'}, '5': 'five', '6': 'six'}
Demo: https://ideone.com/4aat8z
Upvotes: 6
Reputation: 5833
You could use regex for this by identifying the json like:
import re
import json
text = """
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis lacinia efficitur metus, eget finibus leo venenatis non. Sed id massa luctus, hendrerit mauris id, auctor tortor.
{
"1":"one",
"2":"two",
"3":{
"31":{
"311":"threeoneone",
"312":"threeonetwo",
"313":"threeonethree"
}
},
"4":{
"41":"fourone",
"42":"fourtwo",
"43":"fourthree"
},
"5":"five",
"6":"six"
}
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis lacinia efficitur metus, eget finibus leo venenatis non. Sed id massa luctus, hendrerit mauris id, auctor tortor.Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis lacinia efficitur metus, eget finibus leo venenatis non. Sed id massa luctus, hendrerit mauris id, auctor tortor.
"""
result = re.search(r'[a-zA-Z0-9 ,.\n]+(\{[a-zA-Z0-9 \":\{\},\n]+\})[a-zA-Z0-9 ,.\n]+', text)
try:
json_string = result.group(1)
json_data = json.loads(json_string)
print(json_data)
except IndexError:
print("No json found!")
Upvotes: 0
Reputation: 897
Assuming the JSON is not malformed, and assuming all content enclosed inside curly braces are JSON objects:
jsons = []
with open(f) as o:
parse_to_json = ""
for line in o:
if line == "{":
parsing_json_flag = True
if parsing_json_flag:
parse_to_json += line
if line == "}":
parsing_json_flag = False
parse_to_json = ""
jsons.append(parse_to_json)
Now, convert all strings inside the array jsons
with your favorite JSON parsing library.
Upvotes: 0