Rajeev
Rajeev

Reputation: 1061

AWS lambda Python-Split files into smaller files runs in indefinite loop

I am trying to split a file into multiple smaller files and the logic works fine for single file without lamdba but once i add the code to trigger from lambda the script runs in loop without getting completed and writes the file incorrectly.

Based on my debugging so far the outer for loop tries to execute several times even though there is only one file that initiated the trigger

Logic Flow:

File lands in /bigfile/ and the lamdba triggers and tries to split the file based on the logic and places the small file in /splitfiles/

File Content:

ABC|filename1.DAT|123

CDE|filename2.DAT|8910

XYZ|filename3.DAT|456

FGH|filename4.DAT|4545

O/p

File1:

ABC|filename1.DAT|123

CDE|filename2.DAT|8910

File2:

XYZ|filename3.DAT|456

FGH|filename4.DAT|4545

Code:

import boto3
import os
s3client = boto3.client('s3')
s3 = boto3.resource('s3')
def lambda_handler(event, context):
    try:
        for record in event['Records']:
            bucket = record['s3']['bucket']['name']
            key = record['s3']['object']['key']
            print(key)
            obj = s3.Object(bucket, key)
            linesplit = obj.get()['Body'].read().split('\n')
            lines_per_file=2  #number of lines per file 
            created_files = 0 
            sfilelines=''
            for rownum,line in enumerate(linesplit,start=1):
                sfilelines = sfilelines + '\n' + line
                if rownum%lines_per_file == 0:
                 cnt = lines_per_file * (created_files + 1)
                 body_contents = str(sfilelines)
                 file_name = "%s_%s.DAT" % ('Testfile', cnt)
                 target_file = "splitfiles/" + file_name
                 print(target_file)
                 s3client.put_object(ACL='public-read', ServerSideEncryption='AES256', Bucket=bucket, Key=target_file,
                            Body=body_contents)
                 sfilelines = ''  # Reset variables
                 created_files += 1  # One more small file has been created
            if rownum:  # to get the pending lines that is not written
               cnt = lines_per_file * (created_files + 1)
               body_contents = str(sfilelines)
               file_name = "%s_%s.DAT" % ('Testfile', cnt)
               target_file = "splitfiles/" + file_name
               print(target_file)
               s3client.put_object(ACL='public-read', ServerSideEncryption='AES256', Bucket=bucket, Key=target_file,
                            Body=body_contents)
               created_files += 1
        print ('%s split files (with <= %s lines each) were created.' % (created_files,lines_per_file))

    except Exception as e:
        print e 

Upvotes: 0

Views: 2518

Answers (1)

Chris Johnson
Chris Johnson

Reputation: 21956

Depending on how you've defined the Lambda trigger, you may be getting more than one Lambda activation per file, i.e. on different S3 object lifecycle events.

Upvotes: 1

Related Questions