Reputation: 189
I have a json file in S3 containing multiple json objects, which structures resembles below.
"{"category" : "random", "a": 1, "b": 2, "c": 3}"
"{"category" : "automobile", "brand": "bmw", "car": "x3", "price": "100000"}"
"{"category" : "random", "a": 7, "b": 8, "c": 9}"
As you can see, this json file contains multiple json objects which are wrapped as string.
I want to read this json file from s3 and parse it. So I did as below.
import boto3
import json
s3 = boto3.resource('s3')
content_object = s3.Object(bucket,key)
file_content = content_object.get()['Body'].read().decode('utf-8')
json_content = json.loads(file_content)
print(json_content['Details'])
But I got the following error. json.decoder.JSONDecodeError: Extra data
I think this comes from the fact that this json file contains multiple json objects, each wrapped inside a string.
I think I could be able to parse it if I could manage to get each quotations mark at the end and start of each json object.
But I am not quite sure if this is the only (or the best) way to do it (if I manage to do that in efficient manner)
Would there be anyway to parse this json ?
Note : Each Json need not have all the same attributes and although this json file I put above only 3 objects, I would like to scale them to great scale.
Upvotes: 2
Views: 2339
Reputation: 54620
You'll have to do it line by line. This will produce a list of objects.
import boto3
import json
s3 = boto3.resource('s3')
content_object = s3.Object(bucket,key)
file_content = content_object.get()['Body'].read().decode('utf-8')
json_content = [json.loads(line) for line in file_content.splitlines()]
print(json_content)
Upvotes: 3