Reputation: 10249
This stackoverflow answer helped a lot. However, I want to search for all PDFs inside a given bucket.
*.pdf
Enter
Nothing happens. Is there a way to use wildcards or regular expressions to filter bucket search results via the online S3 GUI console?
Upvotes: 58
Views: 167114
Reputation: 383
The CLI can do this; aws s3
only supports prefixes, but aws s3api
supports arbitrary filtering. For s3 links that look like s3://company-bucket/category/obj-foo.pdf
, s3://company-bucket/category/obj-bar.pdf
, s3://company-bucket/category/baz.pdf
, you can run
aws s3api list-objects --bucket "company-bucket" \
--prefix "category/" \
--query "Contents[?ends-with(Key, '.pdf')]"
or for a more general wildcard
aws s3api list-objects --bucket "company-bucket" \
--prefix "category/" \
--query "Contents[?contains(Key, 'foo')]"
or even
aws s3api list-objects --bucket "company-bucket" \
--prefix "category/obj" \
--query "Contents[?ends_with(Key, '.pdf') && contains(Key, 'ba')]"
The full query language is described at JMESPath.
Upvotes: 5
Reputation: 1088
AWS CLI search: In AWS Console,we can search objects within the directory only but not in entire directories, that too with prefix name of the file only(S3 Search limitation).
The best way is to use AWS CLI with below command in Linux OS
aws s3 ls s3://bucket_name/ --recursive | grep search_word | cut -c 32-
Searching files with wildcards
aws s3 ls s3://bucket_name/ --recursive | grep '*.pdf'
Upvotes: 38
Reputation: 11
My guess the files were uploaded from a unix system and your downloading to windows so s3cmd is unable to preserve file permissions which don't apply on NTFS.
To search for files and grab them try this from the target directory or change ./ to target:
for i in `s3cmd ls s3://bucket | grep "searchterm" | awk '{print $4}'`; do s3cmd sync --no-preserve $i ./; done
This works in WSL in windows.
Upvotes: 1
Reputation: 2789
The documentation using the Java SDK suggests it can be done:
https://docs.aws.amazon.com/AmazonS3/latest/dev/ListingKeysHierarchy.html https://docs.aws.amazon.com/AmazonS3/latest/dev/ListingObjectKeysUsingJava.html
Specifically the function listObjectsV2Result
allows you to specify a prefix filter, e.g. "files/2020-01-02*" so you can only return results matching today's date.
Upvotes: 1
Reputation: 1248
As stated in a comment, Amazon's UI can only be used to search by prefix as per their own documentation:
http://docs.aws.amazon.com/AmazonS3/latest/UG/searching-for-objects-by-prefix.html
There are other methods of searching but they require a bit of effort. Just to name two options, AWS-CLI application or Boto3 for Python.
I know this post is old but it is high on Google's list for s3 searching and does not have an accepted answer. The other answer by Harish is linking to a dead site.
UPDATE 2020/03/03: AWS link above has been removed. This is a link to a very similar topic that was as close as I could find. https://docs.aws.amazon.com/AmazonS3/latest/dev/ListingKeysHierarchy.html
Upvotes: 48
Reputation: 3233
I have used this in one of my project but its a bit of hard coding
import subprocess
bucket = "Abcd"
command = "aws s3 ls s3://"+ bucket + "/sub_dir/ | grep '.csv'"
listofitems = subprocess.check_output(command, shell=True,)
listofitems = listofitems.decode('utf-8')
print([item.split(" ")[-1] for item in listofitems.split("\n")[:-1]])
Upvotes: -1
Reputation: 141
You can use the copy function with the --dryrun
flag:
aws s3 ls s3://your-bucket/any-prefix/ .\ --recursive --exclude * --include *.pdf --dryrun
It would show all of the files that are PDFs.
Upvotes: 14
Reputation: 1341
If you use boto3 in Python it's quite easy to find the files. Replace 'bucket' with the name of the bucket.
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('bucket')
for obj in bucket.objects.all():
if '.pdf' in obj.key:
print(obj.key)
Upvotes: 7