Taukheer
Taukheer

Reputation: 1201

Python - Extracting only incremental files from AWS S3

I am currently trying to extract data from AWS S3 using the below code. The code works just fine. However the problem I have is I am trying to run the below script every hour and trying to extract only the incremental files that were added within the last hour. However, the below code extracts all the files from the folder each time. How could I modify the below to only extract files that are not in the LOCAL_PATH folder.

import boto, os
import datetime
from os import path

current_time = datetime.datetime.now().strftime("%Y-%m-%d")


LOCAL_PATH = '/Users/user/Desktop/rep'

AWS_ACCESS_KEY_ID = 'ACCESS'
AWS_SECRET_ACCESS_KEY = 'SECRET'
bucket_name = 'bucket'

# connect to the bucket
conn = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
bucket = conn.get_bucket(bucket_name)


# go through the list of files
bucket_list = bucket.list(prefix='FolderName/{}'.format(current_time))

#bucket_list = bucket.list()
for l in bucket_list:
  keyString = str(l.key)
  d = LOCAL_PATH + keyString
  try:
    l.get_contents_to_filename(d)
  except OSError:
    # check if dir exists
    if not os.path.exists(d):
      os.makedirs(d)

Could anyone assist. Thanks..

Upvotes: 0

Views: 1124

Answers (1)

Red Boy
Red Boy

Reputation: 5729

If your requirement is to download any files added recently need to be downloaded to local file system, then running cron job every hour is old school solution.

You should try AWS S3 Lambda instead of running cron job every hour. Do some homework, how to setup Lambda and use it. I think that's better solution by design.

Upvotes: 1

Related Questions