Reputation: 1201
I am currently trying to extract data from AWS S3 using the below code. The code works just fine. However the problem I have is I am trying to run the below script every hour and trying to extract only the incremental files that were added within the last hour. However, the below code extracts all the files from the folder each time. How could I modify the below to only extract files that are not in the LOCAL_PATH folder.
import boto, os
import datetime
from os import path
current_time = datetime.datetime.now().strftime("%Y-%m-%d")
LOCAL_PATH = '/Users/user/Desktop/rep'
AWS_ACCESS_KEY_ID = 'ACCESS'
AWS_SECRET_ACCESS_KEY = 'SECRET'
bucket_name = 'bucket'
# connect to the bucket
conn = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
bucket = conn.get_bucket(bucket_name)
# go through the list of files
bucket_list = bucket.list(prefix='FolderName/{}'.format(current_time))
#bucket_list = bucket.list()
for l in bucket_list:
keyString = str(l.key)
d = LOCAL_PATH + keyString
try:
l.get_contents_to_filename(d)
except OSError:
# check if dir exists
if not os.path.exists(d):
os.makedirs(d)
Could anyone assist. Thanks..
Upvotes: 0
Views: 1124
Reputation: 5729
If your requirement is to download any files added recently need to be downloaded to local file system, then running cron job every hour is old school solution.
You should try AWS S3 Lambda instead of running cron job every hour. Do some homework, how to setup Lambda and use it. I think that's better solution by design.
Upvotes: 1