giulianoBarbosa
giulianoBarbosa

Reputation: 63

Why boto3.client.download_file is appending a string at the end of file name?

I need to download files from s3, and I create this code:

#This function make the download of the files in a bucket 
def download_dir(s3:boto3.client, bucket:str, directory:str=None) -> None:
    
    #Verify if exist the bucket diretory
    if not os.path.exists(bucket):
                #Creating the bucket directory
                os.makedirs(bucket)
    
    # Iterating in s3 list of objects inside of the bucket
    for s3_key in s3.list_objects(Bucket=bucket)['Contents']:
        
        file_name=os.path.join(bucket, s3_key['Key'])
        #If the object is not a directory
        if not s3_key['Key'].endswith("/"):
            #Verify if the download file was already done  
            if not os.path.exists(file_name):
                print(s3_key['Key'])
                real_file_name = s3_key['Key']
                print(real_file_name)
                try:
                    s3.download_file(Bucket=bucket,Key=s3_key['Key'], Filename=file_name)
                except:
                    print(type(real_file_name))
                    s3.download_file(Bucket=bucket, Filename=file_name, Key=real_file_name)
        #If the object is a directory
        else:
            #If the directory doesn't exist
            if not os.path.exists(file_name):
                #Creating the directory
                os.makedirs(file_name)
    

s3 = boto3.client('s3',
                verify=False,
                aws_access_key_id=aws_dict['aws_access_key_id'], 
                aws_secret_access_key=aws_dict['aws_secret_access_key'], 
                aws_session_token=aws_dict['aws_session_token'],
                region_name=aws_dict['region_name'],
                config=config
                )

download_dir(s3, 'MY-BUCKET')

But a in specific file, magicly is appending another string at the end of the bucket file name, which brings me an exception:

Traceback (most recent call last): File "aws_transfer.py", line 58, in download_dir(s3, 'bucket') File "aws_transfer.py", line 29, in download_dir s3.download_file(Bucket=bucket, Filename=file_name, Key=real_file_name) File "/home/gbarbo3/.local/lib/python3.8/site-packages/boto3/s3/inject.py", line 171, in download_file return transfer.download_file( File "/home/gbarbo3/.local/lib/python3.8/site-packages/boto3/s3/transfer.py", line 315, in download_file future.result() File "/home/gbarbo3/.local/lib/python3.8/site-packages/s3transfer/futures.py", line 106, in result return self._coordinator.result() File "/home/gbarbo3/.local/lib/python3.8/site-packages/s3transfer/futures.py", line 265, in result raise self._exception File "/home/gbarbo3/.local/lib/python3.8/site-packages/s3transfer/tasks.py", line 126, in call return self._execute_main(kwargs) File "/home/gbarbo3/.local/lib/python3.8/site-packages/s3transfer/tasks.py", line 150, in _execute_main return_value = self._main(**kwargs) File "/home/gbarbo3/.local/lib/python3.8/site-packages/s3transfer/download.py", line 571, in _main fileobj.seek(offset) File "/home/gbarbo3/.local/lib/python3.8/site-packages/s3transfer/utils.py", line 367, in seek self._open_if_needed() File "/home/gbarbo3/.local/lib/python3.8/site-packages/s3transfer/utils.py", line 350, in _open_if_needed self._fileobj = self._open_function(self._filename, self._mode) File "/home/gbarbo3/.local/lib/python3.8/site-packages/s3transfer/utils.py", line 261, in open return open(filename, mode) FileNotFoundError: [Errno 2] No such file or directory: 'bucket/folder/model.tar.gz.c40fF924'

The real file name must to be 'bucket/folder/model.tar.gz'.

Can anyone help me with that?

Upvotes: 5

Views: 6602

Answers (1)

John Rotenstein
John Rotenstein

Reputation: 269330

Your program is having problems with sub-directories.

First, an explanation...

Amazon S3 does not use directories. For example, you could run this command to upload a file:

aws s3 cp foo.txt s3://bucketname/folder1/foo.txt

The object would be created in the Amazon S3 bucket with a Key of folder1/foo.txt. If you view the bucket in the S3 management console, the folder1 directory would 'appear', but it doesn't actually exist. If you were to delete that object, the folder1 directory would 'disappear' because it never actually existed.

However, there is also a button in the S3 management console called Create folder. It will create a zero-length object with the name of the 'folder' (eg folder1/). This will 'force' the (pretend) directory to appear, but it still doesn't actually exist.

Your code is specifically checking whether such an object exists in this line:

if not s3_key['Key'].endswith("/"):

The assumption is that there will always be an object with the name of the directory. However, that is not necessarily true (as shown with my example above). Therefore, the program never creates the directory and it then fails when attempting to download an object to a directory that does not exist on your computer.

Your program would need to test the existence of the target directory on your local computer before downloading each object. It cannot rely on there always being an object with a Key that ends with a / for every directory in the bucket.

Upvotes: 7

Related Questions