Parsing Json containing multiple json objects wrapped in string from s3

Question

I have a json file in S3 containing multiple json objects, which structures resembles below.

"{"category" : "random", "a": 1, "b": 2, "c": 3}"
"{"category" : "automobile", "brand": "bmw", "car": "x3", "price": "100000"}"
"{"category" : "random", "a": 7, "b": 8, "c": 9}"

As you can see, this json file contains multiple json objects which are wrapped as string.

I want to read this json file from s3 and parse it. So I did as below.

import boto3
import json

s3 = boto3.resource('s3')

content_object = s3.Object(bucket,key)
file_content = content_object.get()['Body'].read().decode('utf-8')
json_content = json.loads(file_content)
print(json_content['Details'])

But I got the following error. json.decoder.JSONDecodeError: Extra data

I think this comes from the fact that this json file contains multiple json objects, each wrapped inside a string.

I think I could be able to parse it if I could manage to get each quotations mark at the end and start of each json object.

But I am not quite sure if this is the only (or the best) way to do it (if I manage to do that in efficient manner)

Would there be anyway to parse this json ?

Note : Each Json need not have all the same attributes and although this json file I put above only 3 objects, I would like to scale them to great scale.

Tim Roberts · Accepted Answer

You'll have to do it line by line. This will produce a list of objects.

import boto3
import json

s3 = boto3.resource('s3')

content_object = s3.Object(bucket,key)
file_content = content_object.get()['Body'].read().decode('utf-8')
json_content = [json.loads(line) for line in file_content.splitlines()]
print(json_content)

Parsing Json containing multiple json objects wrapped in string from s3

Answers (1)

Related Questions