Rafael Souza
Rafael Souza

Reputation: 15

How to run Lambda function for multiples files in AWS S3

I have the following Lambda function to run my script in AWS Glue when a new occurrence in an S3 bucket is checked.

import json
import boto3
from urllib.parse import unquote_plus

def lambda_handler(event, context):

  bucketName = event["Records"][0]["s3"]["bucket"]["name"]
  fileNameFull = event["Records"][0]["s3"]["object"]["key"]
  fileName = unquote_plus(fileNameFull)

  print(bucket, fileName)

  glue = boto3.client('glue')

  response = glue.start_job_run(
    JobName = 'My_Job_Glue',
    Arguments = {
      '--s3_target_path_key': fileName,
      '--s3_target_path_bucket': bucketName
    }
  )

  return {
    'statusCode': 200,
    'body': json.dumps('Hello from Lambda!')
  }
 

At first it works great and I'm partially getting what I need it to do. What actually happens, is that I will always have more than one file occurrence for the same Bucket and this current logic is running my Glue script at each occurrence (if I have 3 files, I have 3 Glue job runs). How could I improve my function in order to run my script only when all new data is identified? Today I have Kafka Connect configured that batches 5000 records and if at the end of a few minutes this batch is not reached it forces as many records as it has there.

Upvotes: 0

Views: 702

Answers (1)

Mark Sailes
Mark Sailes

Reputation: 852

S3 allows you to notify Lambda functions using Simple Queue Service (SQS). Using SQS allows you to batch messages to Lambda.

Upvotes: 1

Related Questions