Angara kilkiri
Angara kilkiri

Reputation: 167

How to convert csv to json with python on amazon lambda?

I have a lambda function which attempts to take a csv file which was uploaded on a bucket, convert it to json and save it on another bucket. Here is my code:

import json
import os
import boto3
import csv

def lambda_handler(event, context):
    for record in event['Records']:
        bucket = record['s3']['bucket']['name']
        file_key = record['s3']['object']['key']
        s3 = boto3.client('s3')
        csvfile = s3.get_object(Bucket=bucket, Key=file_key)
        csvcontent = csvfile['Body'].read().split(b'\n')

        data = []
        csv_file = csv.DictReader(csvcontent)
        print(csv_file)
        data = list(csv_file)

        os.chdir('/tmp')
        JSON_PATH = file_key[6:] + ".json"
        print(data)
        with open(JSON_PATH, 'w') as output:
          json.dump(data, output)
          bucket_name = 'xxx'
          s3.upload_file(JSON_PATH, bucket_name, JSON_PATH)

The problem is that although when I test this locally on my machine the file can be converted to json, when I run the lambda function I get the following error:

[ERROR] Error: iterator should return strings, not bytes (did you open the file in text mode?)
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 19, in lambda_handler
    data = list(csv_file)
  File "/var/lang/lib/python3.7/csv.py", line 111, in __next__
    self.fieldnames
  File "/var/lang/lib/python3.7/csv.py", line 98, in fieldnames
    self._fieldnames = next(self.reader)

Can someone help me understand why this happens? I have been trying a solution since a while and I don’t understand what the problem is. I appreciate any help you can provide

Upvotes: 0

Views: 7110

Answers (2)

vlad kot
vlad kot

Reputation: 1

Just a small tweak to make it work right:

csvcontent = csvfile['Body'].read().decode().split('\n')

Upvotes: 0

ashiina
ashiina

Reputation: 1006

The result of read() in s3.get_object() is bytes, not strings. The csv. DictReader() expects strings instead of bytes, and that's why it is failing.

You can decode the result of read() into strings using the decode() function with the correct encoding. The following would be a fix:

change this

    csvcontent = csvfile['Body'].read().split(b'\n')

to this

    csvcontent = csvfile['Body'].read().decode('utf-8')

A good way to debug these problems is to use the type() function to check what type your variable is. In your case, you can easily find out the problem by trying print(type(csvcontent)) - it would show that csvcontent indeed is a byte type.

Upvotes: 2

Related Questions