Reputation: 742
I am trying to get the data from a file in Amazon S3, manipulate the content and then save it to another bucket.
import json
import urllib.parse
import boto3
print('Loading function')
s3 = boto3.client('s3')
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
file_name = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
s3_object = s3.get_object(Bucket=bucket, Key=file_name)
file_content = s3_object['Body'].read()
initial_data = json.load(file_content)
# some file manipulation comes here
data=json.dumps(initial_data, ensure_ascii=False)
s3.put_object(Bucket="new bucket name", Body=data, Key=file_name)
error message leads me to think that this has something to do with encoding:
Response:
{
"errorMessage": "'bytes' object has no attribute 'read'",
"errorType": "AttributeError",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 25, in lambda_handler\n data_initlal = json.load(file_content)\n",
" File \"/var/lang/lib/python3.8/json/__init__.py\", line 293, in load\n return loads(fp.read(),\n"
]
}
Additionally, if I remove the following line from my code:
initial_data = json.load(file_content)
I get the error:
Response:
{
"errorMessage": "Object of type bytes is not JSON serializable",
"errorType": "TypeError",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 29, in lambda_handler\n data=json.dumps(file_content, ensure_ascii=False)\n",
" File \"/var/lang/lib/python3.8/json/__init__.py\", line 234, in dumps\n return cls(\n",
" File \"/var/lang/lib/python3.8/json/encoder.py\", line 199, in encode\n chunks = self.iterencode(o, _one_shot=True)\n",
" File \"/var/lang/lib/python3.8/json/encoder.py\", line 257, in iterencode\n return _iterencode(o, 0)\n",
" File \"/var/lang/lib/python3.8/json/encoder.py\", line 179, in default\n raise TypeError(f'Object of type {o.__class__.__name__} '\n"
]
}
The file that I am trying to edit is a json format and the output should also be json.
Upvotes: 3
Views: 1293
Reputation: 269091
This line:
initial_data = json.load(file_content)
Should be:
initial_data = json.loads(file_content)
Alternatively, replace these two lines:
file_content = s3_object['Body'].read()
initial_data = json.load(file_content)
with:
initial_data = json.load(s3_object['Body'])
The difference is json.load()
vs json.loads()
.
Upvotes: 5
Reputation: 572
The file_content that you are trying to read is utf-8 encoded. You need to decode that before converting it to json.
Try this:
initial_data = json.loads(file_content.decode('utf-8'))
Upvotes: 0