shantanuo
shantanuo

Reputation: 32316

download multiple files from S3 bucket using prefix

How do I download these files from S3?

$ aws s3 ls s3://student162a/kagapa_logs/2022-11-08
2022-11-08 00:46:21        607 2022-11-08-00-46-20-D1F1689F5DFAA555
2022-11-08 04:25:12        554 2022-11-08-04-25-11-09852D4EBBA54CAA
2022-11-08 04:27:37        556 2022-11-08-04-27-36-6AB56DD0D92C6C50
2022-11-08 04:29:16        574 2022-11-08-04-29-15-E16FB6F8BAE53BA0
2022-11-08 04:30:08        554 2022-11-08-04-30-07-5BDEB31F5D2E673A
2022-11-08 04:33:40        580 2022-11-08-04-33-39-68883A634F09D12A
2022-11-08 04:38:41        574 2022-11-08-04-38-40-7CBCAAC2C825391B
2022-11-08 04:38:51        598 2022-11-08-04-38-50-F64BB1BFF1565114
2022-11-08 04:43:01        561 2022-11-08-04-43-00-852CE3A46A10FA8A
2022-11-08 09:29:13        572 2022-11-08-09-29-12-4487894C85BEA4A0
2022-11-08 11:13:25        453 2022-11-08-11-13-24-B15E1663350834D5
2022-11-08 11:21:13        436 2022-11-08-11-21-12-19C796E81A1630A5
2022-11-08 18:31:09        525 2022-11-08-18-31-08-79A1114CD6D2331D
2022-11-08 18:34:03        544 2022-11-08-18-34-02-936D7F146C21B0D9

I have tried both sync and cp but it does not seem to work.

$ aws s3 sync s3://student162a/kagapa_logs/2022-11-08 .

$ aws s3 cp  s3://student162a/kagapa_logs/2022-11-08* .

I do not want to use "GUI clients". Is it possible using command line?


Update:

This seems to work. But is there any better (faster) way to download using prefix?

#!/bin/sh
for file in `aws s3 ls  s3://student162a/kagapa_logs/2022-11-08 | awk '{print $4}'`
do
aws s3 cp s3://student162a/kagapa_logs/$file .
done

This is faster than the shell script, but still taking a lot of time if there are thousands of files.

aws s3 ls  s3://student162a/kagapa_logs/2022-11 | awk '{print $4}' | parallel -I% --max-args 1 aws s3 cp s3://student162a/kagapa_logs/% .

I used this shell script to create a text file of all commands:

#!/bin/sh
for file in `aws s3 ls  s3://student162a/kagapa_logs/2022-11 | awk '{print $4}'`
do
echo "aws s3 cp s3://student162a/kagapa_logs/$file ." >> myfile.txt
done

And then used parallel command like this:

parallel --jobs 30 < myfile.txt

The text file generation did not take time. The parallel command took 10 minutes for 1000 files. Am I missing something?


Update 2

Using the console, I searched for prefix 2022-11-08 and then selected and copied all the files to another folder. It works if there are less than 300 files. If there are a lot of files, then I have to select all files on each page and copy to another folder. This option will not work if there are a few thousand files to be downloaded.

Upvotes: 1

Views: 4324

Answers (1)

devops_gagan
devops_gagan

Reputation: 89

AWS S3 CLI provides the option to include or exclude the objects.

More information can be find at - https://docs.aws.amazon.com/cli/latest/reference/s3/#use-of-exclude-and-include-filters

To download multiple files from an aws bucket to your current directory, you can use recursive, exclude, and include flags. The order of the parameters matters.

Example command:

aws s3 cp s3://my_bucket/ . --recursive --include "prefix-a*" --exclude "*"

Make sure to keep the include and exclude in the order you need.

In your case, the command should look like this- aws s3 cp s3://student162a/kagapa_logs/ . --recursive --exclude "*" --include "2022-11-08*"

  • Update- Tested the similar case on my bucket-

aws s3 cp s3://gagan-miller-bucket-bucket/dir1/ . --recursive --exclude "*" --include "2022-11-08"

download: s3://gagan-miller-bucket-bucket/dir1/2022-11-08 004621.txt to ./2022-11-08 004621.txt 
download: s3://gagan-miller-bucket-bucket/dir1/2022-11-08 004621 - Copy (3).txt to ./2022-11-08 004621 - Copy (3).txt 
download: s3://gagan-miller-bucket-bucket/dir1/2022-11-08 004654.txt to ./2022-11-08 004654.txt```

Upvotes: 2

Related Questions