yacub elmi
yacub elmi

Reputation: 23

Python: downloading s3 file based name and date

I'm trying to Pull the file from s3 based on id and date of the filename:

Naming Convention:

The naming convention are as follows:

** ID_NAME_DATE.csv : filename follow that same pattern

example : 9919USEN_File_20180216.csv

example : 9919GBEN_File_20180211.csv

**

Code:

import boto3
import re

def downloadFiletest():

#connect to s3
client = boto3.resource(u's3', aws_access_key_id=u'KEY',
                       aws_secret_access_key=u'TOKEN')
                       
#used for downloading                      
s3 = boto3.client(u's3', aws_access_key_id=u'KEY',
                       aws_secret_access_key=u'TOKEN')
dateIdReg = '[0-9]{8}'
dateSuffix = re.compile(date)
print (u"= S3 Client Connected =")
# configure s3 bucket
bucket = client.Bucket(u'us-eu-Bucket')
b_folder = "/folder/example/"
c_folder = b_folder.lower() + '/'
files_not_found = True
for cList in bucket.objects.filter(Prefix=b_folder):
    cFiles= cList.key
    print ('file : ', cFiles)
    for fileId in cFiles.lower():
        files_not_found = False
        f = fileId.rstrip()
        print(f)
        fileidreg= '[0-9]{4}[a-zA-Z]{4}'
        FileID = re.compile(fileidreg)
        if FileID.match(f) and dateSuffix.match(f):
            print(u'cList.key.lower(): ', cList.key.lower())
            old_file = cList.key
            dot_index = old_file.find(u'.')
            print (u'old dot file name: ', dot_index)
            file_ext = old_file[dot_index:]
            cfile = fileId + '_file_' + dateSuffix + file_ext
            tmp_path = "/tmp/folder/" + cfile
            b_path = cVal + cfile
            print (u'b path : ', b_path)
            s3.download_file("us-eu-Bucket", b_path, tmp_path)
            print ("TEMP PATH: ", tmp_path)
            
    if files_not_found:
        print("ALERT", "No file in {0}/{1}".format(bucket, b_folder))

downloadFiletest()

Error:

It Skips over for fileId in cFiles.lower(): and closes the script.

Goal:

Pull file from S3 and Download it to tmp_path to be used as desired. When pulling file i'd like the script to pick file based on ID and Date. For instance:

Rule: Pseudo:

If S3 has file 9919USEN_File_20180216.csv and 9919USEN_File_20180217.csv then pick 9919USEN_File_20180217.csv to download. Also IF 991USEN_File_2018.csv in S3 then don't pick file as it doesn't match rule, fileidreg = '[0-9]{4}[a-zA-Z]{4}' and dateIdReg = '[0-9]{8}'.

Rule: Visual:

9919USEN_File_20180217.csv > 9919USEN_File_20180216.csv [due to date] 9919USEN_File_20180217.csv > 991USEN_File_2018.csv [Due to Incorrect ID and Date]

Upvotes: 0

Views: 2836

Answers (1)

yacub elmi
yacub elmi

Reputation: 23

Solution

The issue was the way it was structured. I've reorganized and put it in side a try, exception conditional loop. I've also used FileIDPrefix.search instead of FileIDPrefix.match since it was only looking specifically looking at the index and wasn't proper for the question in hand.

final solution.

import boto3
import re

#connect to s3
client = boto3.resource(u's3', aws_access_key_id=u'KEY',
                   aws_secret_access_key=u'TOKEN')

#used for downloading                      
s3 = boto3.client(u's3', aws_access_key_id=u'KEY',
                   aws_secret_access_key=u'TOKEN')

def downloadFiletest():
  date = '[0-9]{8}'  # fileDate regex
  dateSuffix = re.compile(dates)  # regex used to check the date of the file
  reg = '[0-9]{4}[a-zA-Z]{4}'  # filename regex
  fileIDPrefix = re.compile(reg)  # check fileID of the Filename.

  folder = u"/folder/example/"  # directory
  bucket = client.Bucket(bucketname)  # bucket

  try:
      for cuList in bucket.objects.filter(Prefix=folder):  # filter to the folder

          filenames= cList.key  # directory of the files that we would like to use
          print(cu)

          # specific locations of site fileID of the file and date of the file
          fileID = filenames[33:41]
          fileDate = filenames[51:59]

          # check the length of each values to be verified later.
          lenf = len(fileID)
          lenG = len(fileDate)
          old_file = cList.key
          dot_index = old_file.find(u'.')
          file_ext = old_file[dot_index:]

          # this check that the files in directory match our specified rules. if does it proceeds.
          if fileIDPrefix.search(cu) and fileDateSuffix.search(cu):
              filename = fileID + u'_file_' + fileDate + file_ext
              tmp_path = "/tmp/mpcmt/" + filename
              file_path = folder + filename
              s3.download_file(bucketname, file_path, tmp_path)


              return filename, tmp_path, fileID, fileDate

              # this check the number of values/char in a directory to see it matches up to what is expected.

          if dot_index > 59 or dot_index < 59:
                  print('File has wrong fileID or Wrong Date')
          if lenG > 8 or lenG < 8:
                  print('File has wrong fileDate Format')
          if lenf > 8 or lenf < 8:
                  print('File has wrong fileID')

  except Exception as e:  # this closes and displays an error if the file doesn't exist.
      print("ALERT", "No file in {0}/{1}".format(bucket, folder))
      # There was some issue / error / problem and that is why the program is exiting.
      print >> sys.stderr, "No file in {0}/{1}".format(bucket, folder)
      print >> sys.stderr, "Exception: %s" % str(e)
      sys.exit(1)


downloadFiletest()

Upvotes: 1

Related Questions