Reputation: 167
I have a lambda function which attempts to take a csv file which was uploaded on a bucket, convert it to json and save it on another bucket. Here is my code:
import json
import os
import boto3
import csv
def lambda_handler(event, context):
for record in event['Records']:
bucket = record['s3']['bucket']['name']
file_key = record['s3']['object']['key']
s3 = boto3.client('s3')
csvfile = s3.get_object(Bucket=bucket, Key=file_key)
csvcontent = csvfile['Body'].read().split(b'\n')
data = []
csv_file = csv.DictReader(csvcontent)
print(csv_file)
data = list(csv_file)
os.chdir('/tmp')
JSON_PATH = file_key[6:] + ".json"
print(data)
with open(JSON_PATH, 'w') as output:
json.dump(data, output)
bucket_name = 'xxx'
s3.upload_file(JSON_PATH, bucket_name, JSON_PATH)
The problem is that although when I test this locally on my machine the file can be converted to json, when I run the lambda function I get the following error:
[ERROR] Error: iterator should return strings, not bytes (did you open the file in text mode?)
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 19, in lambda_handler
data = list(csv_file)
File "/var/lang/lib/python3.7/csv.py", line 111, in __next__
self.fieldnames
File "/var/lang/lib/python3.7/csv.py", line 98, in fieldnames
self._fieldnames = next(self.reader)
Can someone help me understand why this happens? I have been trying a solution since a while and I don’t understand what the problem is. I appreciate any help you can provide
Upvotes: 0
Views: 7110
Reputation: 1
Just a small tweak to make it work right:
csvcontent = csvfile['Body'].read().decode().split('\n')
Upvotes: 0
Reputation: 1006
The result of read()
in s3.get_object()
is bytes, not strings. The csv. DictReader()
expects strings instead of bytes, and that's why it is failing.
You can decode the result of read()
into strings using the decode()
function with the correct encoding. The following would be a fix:
change this
csvcontent = csvfile['Body'].read().split(b'\n')
to this
csvcontent = csvfile['Body'].read().decode('utf-8')
A good way to debug these problems is to use the type()
function to check what type your variable is. In your case, you can easily find out the problem by trying print(type(csvcontent))
- it would show that csvcontent
indeed is a byte
type.
Upvotes: 2