Reputation: 1061
I am trying to split a file into multiple smaller files and the logic works fine for single file without lamdba but once i add the code to trigger from lambda the script runs in loop without getting completed and writes the file incorrectly.
Based on my debugging so far the outer for loop tries to execute several times even though there is only one file that initiated the trigger
Logic Flow:
File lands in /bigfile/ and the lamdba triggers and tries to split the file based on the logic and places the small file in /splitfiles/
File Content:
ABC|filename1.DAT|123
CDE|filename2.DAT|8910
XYZ|filename3.DAT|456
FGH|filename4.DAT|4545
O/p
File1:
ABC|filename1.DAT|123
CDE|filename2.DAT|8910
File2:
XYZ|filename3.DAT|456
FGH|filename4.DAT|4545
Code:
import boto3
import os
s3client = boto3.client('s3')
s3 = boto3.resource('s3')
def lambda_handler(event, context):
try:
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = record['s3']['object']['key']
print(key)
obj = s3.Object(bucket, key)
linesplit = obj.get()['Body'].read().split('\n')
lines_per_file=2 #number of lines per file
created_files = 0
sfilelines=''
for rownum,line in enumerate(linesplit,start=1):
sfilelines = sfilelines + '\n' + line
if rownum%lines_per_file == 0:
cnt = lines_per_file * (created_files + 1)
body_contents = str(sfilelines)
file_name = "%s_%s.DAT" % ('Testfile', cnt)
target_file = "splitfiles/" + file_name
print(target_file)
s3client.put_object(ACL='public-read', ServerSideEncryption='AES256', Bucket=bucket, Key=target_file,
Body=body_contents)
sfilelines = '' # Reset variables
created_files += 1 # One more small file has been created
if rownum: # to get the pending lines that is not written
cnt = lines_per_file * (created_files + 1)
body_contents = str(sfilelines)
file_name = "%s_%s.DAT" % ('Testfile', cnt)
target_file = "splitfiles/" + file_name
print(target_file)
s3client.put_object(ACL='public-read', ServerSideEncryption='AES256', Bucket=bucket, Key=target_file,
Body=body_contents)
created_files += 1
print ('%s split files (with <= %s lines each) were created.' % (created_files,lines_per_file))
except Exception as e:
print e
Upvotes: 0
Views: 2518
Reputation: 21956
Depending on how you've defined the Lambda trigger, you may be getting more than one Lambda activation per file, i.e. on different S3 object lifecycle events.
Upvotes: 1