Rama Feichu
Rama Feichu

Reputation: 65

AWS CLI: Copy certain number of files from each bucket to local

I'm trying to download the first (just to give an example) 10 files from every bucket. I'm new to this, I've read aws cli's documentation but couldn't find anything about this.

Upvotes: 0

Views: 1691

Answers (3)

Anandkumar
Anandkumar

Reputation: 1502

if you don't want to write a script and if there is a pattern (like A*.csv) that you want to copy, (I know the question is copy certain number of files), some time it is random number of subset of files, that you might want to copy to test it.

Below command was very useful for me

aws s3 cp  s3://noaa-gsod-pds/2022/  s3://<target_bucket_name>/2022/ --recursive --exclude '*' --include 'A*.csv' 

If you want to write a script (below command will get you 10 objects from S3 bucket and you can write a script to action(copy) on those objects)

aws s3api list-objects --max-items 10 --bucket noaa-gsod-pds | jq '.Contents' | jq '.[] | .Key'
  • noaa-gsod-pds - is a public bucket with some sample dataset
  • jq needs to be installed for the above command to work

Upvotes: -1

John Rotenstein
John Rotenstein

Reputation: 269101

There is no single command that can do this.

However, you could combine several AWS CLI commands together.

For example, these commands will LIST the first 10 objects:

for bucket in $(aws s3api list-buckets --query Buckets[].[Name] --output text);
do
  echo Bucket: $bucket;
  aws s3api list-objects --query Contents[0:10].[Key] --bucket $bucket --output text;
done;

First, it obtains a list of buckets then, for each bucket, lists the names of the first 10 files.

You could modify if by adding another for that will call aws s3 cp with the filename to download the objects.

You should also consider what you'd like to do about clashing filenames (for example, if a file with the same name appears in the first 10 files of more than one bucket).

Upvotes: 0

jarmod
jarmod

Reputation: 78573

The awscli has two broad sets of functions for S3: aws s3 and aws s3api. The former is higher level (it includes sync features, for example) while the lower maps closely to the underlying S3 APIs.

You can script a simple solution that uses aws s3api list-objects --max-items 10 to get a list of at most 10 objects from the bucket, and then copies them one-by-one.

Upvotes: 2

Related Questions