Sarah
Sarah

Reputation: 617

AWS Lambda: How to read CSV files in S3 bucket then upload it to another S3 bucket?

I'm doing a project, where I read files from the S3 bucket and to get rid of all NA values then upload them to the different S3 bucket. I've been watching a Lambda tutorial and example codes, but I have a hard time understanding how it really works.
My goal is to read any file in the S3 bucket and using the Lambda function, I drop all the NA values, then upload them to a different S3 bucket. But I don't really understand what is going on. I read the documentation, but it wasn't very helpful for me to understand.
How can I make the below code to read CSV files from the S3 bucket, then drop all NA values, then upload them to the new S3 bucket?

import json
import os
import boto3
import csv

def lambda_handler(event, context):
    
    for record in event['Records']:
        
        bucket = record['s3']['bucket']['name']
        file_key = record['s3']['object']['key']
        s3 = boto3.client('s3')
        
        csv_file = s3.get_object(Bucket=bucket, Key=file_key)
        csv_content = csv_file['Body'].read().split(b'\n')
        
        csv_data = csv.DictReader(csv_content)

Any links to the documentation, or video and advice will be appreciated.

Upvotes: 0

Views: 2737

Answers (1)

samtoddler
samtoddler

Reputation: 9605

Uploading files

def upload_file(file_name, bucket, object_name=None):
    """Upload a file to an S3 bucket

    :param file_name: File to upload
    :param bucket: Bucket to upload to
    :param object_name: S3 object name. If not specified then file_name is used
    :return: True if file was uploaded, else False
    """

    # If S3 object_name was not specified, use file_name
    if object_name is None:
        object_name = file_name

    # Upload the file
    s3_client = boto3.client('s3')
    try:
        response = s3_client.upload_file(file_name, bucket, object_name)
    except ClientError as e:
        logging.error(e)
        return False
    return True

s3 download_file

import boto3
s3 = boto3.resource('s3')
s3.meta.client.download_file('mybucket', 'hello.txt', '/tmp/hello.txt')

Now you simply put these calls in any way you want to and process your csv files and then how you process and upload to s3 in efficiency that would be a completely different topic.

There are plenty of answere her ein this post How to upload a file to directory in S3 bucket using boto

You can check this one as well if curious, gives some idea how to process larger files.

Step 4: Create the Lambda function that splits input data

Upvotes: 1

Related Questions