Chako
Chako

Reputation: 11

AWS Lambda - Python - S3 to Drive Service working, problem with Large Files

So, I'm creating a service that copies fiiles from an S3 Bucket to a Drive folder. It triggers a Lambda when an object is created on the S3 Bucket/folder. I grab the file (large TXT file), copy its contents "locally" to be able to upload it to drive through its API. Program is working great, except for big files. Im running out of memory:

     [Errno 28] No space left on device: OSError
     Traceback (most recent call last):
       File "/var/task/lambda_function.py", line 67, in lambda_handler
         raise e
       File "/var/task/lambda_function.py", line 56, in lambda_handler
         f.write(chunk[0:last_newline+1].decode('utf-8'))
     OSError: [Errno 28] No space left on device

Here's the code that generates that issue, file is around 270 Mb. Testing running Lambda with 2Gb of Memory configured.

obj = s3.get_object(Bucket = bucket,Key=key) #Getting the object that triggered the Lambda
fullpath="/tmp/"+key #To create a "local file" keeping its original name
os.chdir('/tmp')
f= open(fullpath,"w+") #Open the file to start writing in it.
body = obj['Body']
chunk_size = 1000000 * 10 #Reading it in chunks of ~10MB
newline = '\n'.encode()   
partial_chunk = b''
while (True):
    chunk = partial_chunk + body.read(chunk_size)
    if chunk == b'':
        break
    last_newline = chunk.rfind(newline)
    f.write(chunk[0:last_newline+1].decode('utf-8')) #Writing to the file (GETTING OUT OF MEMORY HERE)
    f.flush()
    # keep the partial line you've read here
    partial_chunk = chunk[last_newline+1:]
f.close() 
upload(parent,filename,key,fullpath)

And for the upload, I'm doing:

 upload():
 ...
 drive_service = build('drive', 'v3', credentials=credenciales)
 media_body1 = MediaFileUpload(fullpath, resumable=True,chunksize=1024 * 1024 * 5) #Using 5MB chunks
 body1={'name':fullpath,'parents':[parent]}#,'media_body':content}
 Filesubido = drive_service.files().create(body=body1, media_body=media_body1, supportsAllDrives='True')
 response = None
 while response is None:
     time.sleep(5)
     status, response = Filesubido.next_chunk()
     if status:
         print ("Uploaded "+ str(int(status.progress() * 100)))
print ("Upload Complete!")

Things I've tried:

Questions:

Cheers!

Upvotes: 1

Views: 752

Answers (1)

John Rotenstein
John Rotenstein

Reputation: 270089

Update: The 512MB storage limit for Lambda functions has been increased, see: AWS Lambda Now Supports Up to 10 GB Ephemeral Storage | AWS News Blog


AWS Lambda functions can use 512MB of space on the /tmp drive. When a Lambda function has finished using a temporary file, it should delete the file so that the space is available for any future function executions that re-use the same Lambda container.

As long as your files are under 512MB, then this should work fine for you.

If they are larger than 512MB, then you would need to read chunks of the data from Amazon S3 and upload them to their destination, looping through all the chunks. This makes it more complex for both reading and writing.

Alternatively, you could use a different compute service, such as Amazon EC2 or Fargate.

Upvotes: 0

Related Questions