Reputation: 1561
I have a S3 bucket in which there are several log files stored having the format index.log.yyyy-mm-dd-01 index.log.yyyy-mm-dd-02 . . .
yyyy for year, mm for month and dd for date.
Now i want to download only a few of them. I saw Downloading an entire S3 bucket?. The accepted answer of this post is working absolutely fine if I want to download the entire bucket but what should I do if I want to do some pattern matching? I tried the following commands but they didn't worked:
aws s3 sync s3://mybucket/index.log.2014-08-01-* .
aws s3 sync 's3://mybucket/index.log.2014-08-01-*' .
I also tried using s3cmd for downloading purpose using http://fosshelp.blogspot.in/2013/06 article's POINT 7 and http://s3tools.org/s3cmd-sync. Following were the commands that I ran:
s3cmd -c myconf.txt get --exclude '*.log.*' --include '*.2014-08-01-*' s3://mybucket/ .
s3cmd -c myconf.txt get --exclude '*.log.*' --include '*.2014-08-01-*' s3://mybucket/ .
and a few more permutations of this.
Can anyone tell me why isn't pattern matching happening? Or if there is any other tool that I need to use.
Thanks !!
Upvotes: 23
Views: 8505
Reputation: 599
I needed to grab files from a s3 access logs bucket, and I found the official aws cli tool to be really very slow for that task. So I looked for alternatives.
https://github.com/peak/s5cmd worked great!
supports globs, for example:
s5cmd -numworkers 30 cp 's3://logs-bucket/2022-03-30-19-*' .
is really blazing fast , so you can work with buckets that have s3 access logs without much fuss.
Upvotes: 0
Reputation: 1561
Found the solution for the problem. Although I don't know that why other commands were not working.. Solution is as follows:
aws s3 sync s3://mybucket . --exclude "*" --include "*.2014-08-01-*"
Note: --exclude "*" should come before --include "---", doing the reverse won't print anything since it will execute 'exclude' after 'include' (unable to find the reference now where I read this).
Upvotes: 26