Reputation: 41
I have a few hundred of PDFs in a s3 Bucket and I want a lambda function that creates a zip file for all my PDFs.
Doing this on my local Python is obviously easy enough and I had assumed the logic would transfer over to AWS Lambda in a pretty straight forward way. But so far I haven't managed to get this working.
I have been using the zipfile Python library, as well as boto3. My logic is as simple as finding all the files, appending them to a list of 'files_to_zip' and then iterating through that list writing each one to the new zip file.
This however has kicked up a number of issues and I think this is due to my short falls in understanding how calling and loading files works in Lambda.
Here is the code I have tried so far
import os
import boto3
from io import BytesIO, StringIO
from zipfile import ZipFile, ZIP_DEFLATED
def zipping_files(event, context):
s3 = boto3.resource('s3')
BUCKET = 'BUCKET NAME'
PREFIX_1 = 'KEY NAME'
new_zip = r'NEW KEY NAME'
s3_client = boto3.client('s3')
files_to_zip = []
response = s3_client.list_objects_v2(Bucket=BUCKET, Prefix=PREFIX_1)
all = response['Contents']
for i in all:
files_to_zip.append(str(i['Key']))
with ZipFile(new_zip, 'w', compression=ZIP_DEFLATED, allowZip64=True) as new_zip:
for file in files_to_zip:
new_zip.write(file)
I am getting error messages such as my new_zip string does not exist (FileNotFoundError) and this is a read only action.
Upvotes: 3
Views: 9769
Reputation: 384
here how we can solve this
import os
import boto3
from io import BytesIO, StringIO
from zipfile import ZipFile, ZIP_DEFLATED
def zipping_files(event, context):
s3 = boto3.resource('s3')
BUCKET = 'BUCKET NAME'
PREFIX_1 = 'KEY NAME'
s3_client = boto3.client('s3')
files_to_zip = []
response = s3_client.list_objects_v2(Bucket=BUCKET, Prefix=PREFIX_1)
all = response['Contents']
for i in all:
files_to_zip.append(str(i['Key']))
# we download all files to tmp directory of lambda for that we create directory structure in /tmp same as s3 files structure (subdirectory)
for KEY in files_to_zip:
try:
local_file_name = '/tmp/'+KEY
if os.path.isdir(os.path.dirname(local_file_name)):
print(local_file_name)
else:
os.mkdir(os.path.dirname(local_file_name))
s3_client.Bucket(bucket).download_file(KEY, local_file_name)
except botocore.exceptions.ClientError as e:
print(e.response)
#now create empty zip file in /tmp directory use suffix .zip if you want
with tempfile.NamedTemporaryFile('w', suffix='.tar.gz', delete=False) as f:
with ZipFile(f.name, 'w', compression=ZIP_DEFLATED, allowZip64=True) as zip:
for file in files_to_zip:
zip.write('/tmp/'+file)
#once zipped in temp copy it to your preferred s3 location
s3_client.meta.client.upload_file(f.name, bucket, 'destination_s3_path ex. out/filename.tar.gz')
print('All files zipped successfully!')
Upvotes: 2
Reputation: 4950
This code sample attempts to create a local file NEW KEY NAME
on the local filesystem of the Lambda function's container, in the default directory (which is /var/task
afaik).
Step 1: make a decent file path in the /tmp
directory, i.e. os.path.join('/tmp', target_filename)
.
Step 2: your code is not uploading the zipfile to S3. add a call to s3_client.put_object
.
Upvotes: 0