Reputation: 79
I am using python 2.7.x, and Boto API 2.X to connect to AWS S3 bucket. I have a unique situation where I want to download files from S3 bucket that to from a specific directory/folder say myBucket/foo/
. But the catch is I want to leave a latest file behind in S3 folder and not download it. Once, I download these files on my local box, I want to move these files out to a different folder under the same bucket say myBucket/foo/bar/
. Has anyone worked on similar situation before?
Here is some explanation:
My S3 bucket : event-logs The folder path on S3 bucket from where files will be downloaded:
event-logs/apps/raw/source_data/
The folder path on S3 bucket where the downloaded files will be moved(archive):
event-logs/apps/raw/archive_data/
Note: The "event-logs/apps/raw/" path is common above under the same bucket
So if I have 5 files under source_data folder on S3:
s3://event-logs/apps/raw/source_data/data1.gz
event-logs/apps/raw/source_data/data2.gz
event-logs/apps/raw/source_data/data3.gz
event-logs/apps/raw/source_data/data4.gz
event-logs/apps/raw/source_data/data5.gz
I need to download first 4 files (oldest files) to my local machine and leave the latest file I.e. data5.gz
behind. After the download is complete move those files from S3 ../source_data
folder to ../Archive_data
folder under the same S3 bucket and delete from the original source_data folder. Here is my code to list the files from S3, then to download files and then to delete the files.
AwsLogShip = AwsLogShip(aws_access_key, aws_secret_access_key, use_ssl=True)
bucket = AwsLogShip.getFileNamesInBucket(aws_bucket)
def getFileNamesInBucket(self, aws_bucket):
if not self._bucketExists(aws_bucket):
self._printBucketNotFoundMessage(aws_bucket)
return list()
else:
bucket = self._aws_connection.get_bucket(aws_bucket)
return map(lambda aws_file_key: aws_file_key.name, bucket.list("apps/raw/source_data/"))
AwsLogShip.downloadAllFilesFromBucket(aws_bucket, local_download_directory)
def downloadFileFromBucket(self, aws_bucket, filename, local_download_directory):
if not self._bucketExists(aws_bucket):
self._printBucketNotFoundMessage(aws_bucket)
else:
bucket = self._aws_connection.get_bucket(aws_bucket)
for s3_file in bucket.list("apps/raw/source_data"):
if filename == s3_file.name:
self._downloadFile(s3_file, local_download_directory)
Break;
AwsLogShip.deleteAllFilesFromBucket(aws_bucket)
def deleteFilesInBucketWith(self, aws_bucket, filename):
if not self._bucketExists(aws_bucket):
self._printBucketNotFoundMessage(aws_bucket)
else:
bucket = self._aws_connection.get_bucket(aws_bucket)
for s3_file in filter(lambda fkey: filename(fkey.name), bucket.list("apps/raw/source_data/")):
self._deleteFile(bucket, s3_file)
What I really want to achieve here is:
Upvotes: 0
Views: 2534
Reputation: 79
This is how I solved this problem!
bucket_list = bucket.list(prefix='Download/test_queue1/', delimiter='/')
list1 = sorted(bucket_list, key= lambda item1: item1.last_modified)
self.list2 = list1[:-1]
for item in self.list2:
self._bucketList(bucket, item)
def _bucketList(self,bucket, item):
print item.name, item.last_modified
Upvotes: 0