Reputation: 477
I am attempting to move files from one folder to another in the same bucket after I am done processing each one of them. I am running this python3 script on an EC2 instance.
import os.path
from boto3.session import Session
ACCESS_KEY = 'XXXXXXXXXXXXXXXX'
SECRET_KEY = 'YYYYYYYYYYYYYYYYYY'
BUCKET_NAME = 'test-s3logs'
MAX_FILES_READ = 10
SOURCE_PREFIX = 'logs/'
DESTINATION_PREFIX = 'processed/'
words = ['London', 'user/xxxx']
if __name__ == "__main__":
# Use Boto to connect to S3 and get a list of objects from a bucket
session = Session(aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY)
s3 = session.resource('s3')
# call S3 to list current buckets
source_bucket = s3.Bucket(BUCKET_NAME)
dest_bucket = s3.Bucket(BUCKET_NAME)
index = 0
for s3_file in source_bucket.objects.filter(Prefix=SOURCE_PREFIX):
index += 1
if index == MAX_FILES_READ:
break
found_get_record = False
source_obj = s3.Object(BUCKET_NAME, s3_file.key)
for line in source_obj.get()['Body']._raw_stream:
line_found = line.decode('utf-8')
if all(word in line_found for word in words):
found_get_record = True
# insert record in database
print ('insert into db')
if found_get_record:
# move the file to processed folder
print ('moving file: {}', format(s3_file.key))
old_source = { 'Bucket': BUCKET_NAME,'Key': s3_file.key}
dest_obj = dest_bucket.Object(s3_file.key.replace(SOURCE_PREFIX, DESTINATION_PREFIX, 1))
dest_obj.copy(old_source)
source_obj.delete()
else:
# delete file.
print ('deleting file: {}', format(s3_file.key))
source_obj.delete()
Logic:
The "logs" folder is populated with log files from a different process.
This script periodically checks the "logs" folder and opens it for reading, checks for certain keywords. If found, some details are inserted in DB and the file is moved to "processed" folder. If keywords are not found, the log file is simply deleted.
Question:
When there are no log files remaining the script deletes the "logs" folder as well. How to stop that?
I've done my best with script. Is there a way to clean this up?
Otherwise, it runs fine.
Upvotes: 1
Views: 5263
Reputation: 269330
Folders do not actually exist in Amazon S3.
For example, you could create an object called invoices/foo.txt
and it will work fine even if there is no invoices
folder. The invoices
folder will magically appear in the console, and also magically disappear when there are no more objects inside it.
Therefore, one option is simply do not worry about folders.
If you do create a folder within the S3 management console, a zero-length object is created with a Key equal to the name of the folder. This forces an empty folder to appear, even though it doesn't actually exists.
To prevent the 'folder' being deleted, simply do not delete this zero-length object with the same name as the folder. Your code could either check the length of the object, or check whether its Key matches the folder name (with its full path). If so, do not delete it.
Upvotes: 1