Reputation: 81
I am trying to download only specific files from AWS. I have the list of file URLs. Using the CLI I can only download all files in a bucket using the --recursive command, but I only want to download the files in my list. Any ideas on how to do that?
Upvotes: 7
Views: 18386
Reputation: 419
you can use python boto3 script for this, you can download with the same file structure of S3 Bucket.
import boto3
import botocore
import os
# Initialize the S3 client
aws_access_key_id = 'AWSACESSKEY'
aws_secret_access_key = 'AWSSECRETACESSKEY'
s3 = boto3.client('s3', aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key)
# List of image object keys you want to download
image_keys_to_download = [
"sta10/maths/chapter10.pdf",
"sta11/science/biology/chapter08.pdf",
"data/sta10/class11/scientists/pythagoras"
]
# Destination directory where the images will be downloaded
destination_base_directory = "~/s3-download-class"
# Loop through the list of image keys and download each image while preserving folder structure
for image_key in image_keys_to_download:
try:
# Extract the folder structure from the image key
folder_structure = os.path.dirname(image_key)
# Create the destination directory including the folder structure
destination_directory = os.path.join(destination_base_directory, folder_structure)
# Make sure the destination directory exists
os.makedirs(destination_directory, exist_ok=True)
# Get the object from S3 and save it locally while preserving folder structure
local_file_path = os.path.join(destination_directory, os.path.basename(image_key))
s3.download_file('class-10-data', image_key, local_file_path) #Add your S3 Bucket name here.
print(f"Downloaded {image_key}")
except botocore.exceptions.NoCredentialsError:
print("AWS credentials not found. Make sure you have configured your credentials.")
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print(f"Image {image_key} not found in the S3 bucket.")
else:
print(f"Error downloading {image_key}: {e}")
print("Download process completed.")
I Hope With This Python Script, you can download specific images with same folder structure.
Upvotes: 0
Reputation: 621
Since you have the s3 urls already in a file (say file.list
), like -
s3://bucket/file1
s3://bucket/file2
You could download all the files to your current working directory with a simple bash script -
while read -r line;do aws s3 cp "$line" .;done < test.list
Upvotes: 1
Reputation: 126
This is possibly a duplicate of: Selective file download in AWS S3 CLI
You can do something along the lines of:
aws s3 cp s3://BUCKET/ folder --exclude "*" --include "2018-02-06*" --recursive
https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html
Upvotes: 8