Reputation: 5126
Pretty basic but I am not able to download files given s3 path.
for eg, I have this s3://name1/name2/file_name.txt
import boto3
locations = ['s3://name1/name2/file_name.txt']
s3_client = boto3.client('s3')
bucket = 'name1'
prefix = 'name2'
for file in locations:
s3_client.download_file(bucket, 'file_name.txt', 'my_local_folder')
I am getting error as botocore.exceptions.ClientError: An error occurred (404) when calling the HeadObject operation: Not Found
This file exists as when I download. using aws cli as s3 path: s3://name1/name2/file_name.txt .
Upvotes: 8
Views: 44504
Reputation: 59
s3_client = boto3.client('s3')
locations = ['s3://name1/name2/file_name.txt']
for location in locations:
path_parts = location.replace('s3://', '').split('/')
bucket = path_parts[0]
object_key = '/'.join(path_parts[1:])
local_path = 'my_local_folder/' + os.path.basename(object_key)
s3_client.download_file(bucket, object_key, local_path)
Upvotes: -1
Reputation: 1175
You may need to do this with some type of authentication. There are several methods, but creating a session is simple and fast:
from boto3.session import Session
bucket_name = 'your_bucket_name'
folder_prefix = 'your/path/to/download/files'
credentials = 'credentials.txt'
with open(credentials, 'r', encoding='utf-8') as f:
line = f.readline().strip()
access_key = line.split(':')[0]
secret_key = line.split(':')[1]
session = Session(
aws_access_key_id=access_key,
aws_secret_access_key=secret_key
)
s3 = session.resource('s3')
bucket = s3.Bucket(bucket_name)
for s3_file in bucket.objects.filter(Prefix=folder_prefix):
file_object = s3_file.key
file_name = str(file_object.split('/')[-1])
print('Downloading file {} ...'.format(file_object))
bucket.download_file(file_object, '/tmp/{}'.format(file_name))
In credentials.txt
file you must add a single line where you concatenate the access key id and the secret, for example:
~$ cat credentials.txt
AKIAIO5FODNN7EXAMPLE:ABCDEF+c2L7yXeGvUyrPgYsDnWRRC1AYEXAMPLE
Don't forget to protect this file well on your host, give read-only permissions for the user who runs this program. I hope it works for you, it works perfectly for me.
Upvotes: 3
Reputation: 174748
You need to have a list of filename paths, then modify your code like shown in the documentation:
import os
import boto3
import botocore
files = ['name2/file_name.txt']
bucket = 'name1'
s3 = boto3.resource('s3')
for file in files:
try:
s3.Bucket(bucket).download_file(file, os.path.basename(file))
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
Upvotes: 12