nu everest
nu everest

Reputation: 10249

How to search an Amazon S3 Bucket using Wildcards?

This stackoverflow answer helped a lot. However, I want to search for all PDFs inside a given bucket.

  1. I click "None".
  2. Start typing.
  3. I type *.pdf
  4. Press Enter

Nothing happens. Is there a way to use wildcards or regular expressions to filter bucket search results via the online S3 GUI console?

Upvotes: 58

Views: 167114

Answers (8)

jkmartin
jkmartin

Reputation: 383

The CLI can do this; aws s3 only supports prefixes, but aws s3api supports arbitrary filtering. For s3 links that look like s3://company-bucket/category/obj-foo.pdf, s3://company-bucket/category/obj-bar.pdf, s3://company-bucket/category/baz.pdf, you can run

aws s3api list-objects --bucket "company-bucket" \
  --prefix "category/" \
  --query "Contents[?ends-with(Key, '.pdf')]"

or for a more general wildcard

aws s3api list-objects --bucket "company-bucket" \
  --prefix "category/" \
  --query "Contents[?contains(Key, 'foo')]"

or even

aws s3api list-objects --bucket "company-bucket" \
  --prefix "category/obj" \
  --query "Contents[?ends_with(Key, '.pdf') && contains(Key, 'ba')]"

The full query language is described at JMESPath.

Upvotes: 5

Tech Support
Tech Support

Reputation: 1088

AWS CLI search: In AWS Console,we can search objects within the directory only but not in entire directories, that too with prefix name of the file only(S3 Search limitation).

The best way is to use AWS CLI with below command in Linux OS

aws s3 ls s3://bucket_name/ --recursive | grep search_word | cut -c 32- 

Searching files with wildcards

aws s3 ls s3://bucket_name/ --recursive | grep '*.pdf'

Upvotes: 38

Tiberius
Tiberius

Reputation: 11

My guess the files were uploaded from a unix system and your downloading to windows so s3cmd is unable to preserve file permissions which don't apply on NTFS.

To search for files and grab them try this from the target directory or change ./ to target:

for i in `s3cmd ls s3://bucket | grep "searchterm" | awk '{print $4}'`; do s3cmd sync --no-preserve $i ./; done

This works in WSL in windows.

Upvotes: 1

Philluminati
Philluminati

Reputation: 2789

The documentation using the Java SDK suggests it can be done:

https://docs.aws.amazon.com/AmazonS3/latest/dev/ListingKeysHierarchy.html https://docs.aws.amazon.com/AmazonS3/latest/dev/ListingObjectKeysUsingJava.html

Specifically the function listObjectsV2Result allows you to specify a prefix filter, e.g. "files/2020-01-02*" so you can only return results matching today's date.

https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/ListObjectsV2Result.html

Upvotes: 1

Michael Hohlios
Michael Hohlios

Reputation: 1248

As stated in a comment, Amazon's UI can only be used to search by prefix as per their own documentation:

http://docs.aws.amazon.com/AmazonS3/latest/UG/searching-for-objects-by-prefix.html

There are other methods of searching but they require a bit of effort. Just to name two options, AWS-CLI application or Boto3 for Python.

I know this post is old but it is high on Google's list for s3 searching and does not have an accepted answer. The other answer by Harish is linking to a dead site.

UPDATE 2020/03/03: AWS link above has been removed. This is a link to a very similar topic that was as close as I could find. https://docs.aws.amazon.com/AmazonS3/latest/dev/ListingKeysHierarchy.html

Upvotes: 48

Deepak Tripathi
Deepak Tripathi

Reputation: 3233

I have used this in one of my project but its a bit of hard coding

import subprocess
bucket = "Abcd"
command = "aws s3 ls s3://"+ bucket + "/sub_dir/ | grep '.csv'"
listofitems = subprocess.check_output(command, shell=True,)
listofitems = listofitems.decode('utf-8')
print([item.split(" ")[-1] for item in listofitems.split("\n")[:-1]])

Upvotes: -1

user11002455
user11002455

Reputation: 141

You can use the copy function with the --dryrun flag:

aws s3 ls s3://your-bucket/any-prefix/ .\ --recursive --exclude * --include *.pdf --dryrun

It would show all of the files that are PDFs.

Upvotes: 14

Matts
Matts

Reputation: 1341

If you use boto3 in Python it's quite easy to find the files. Replace 'bucket' with the name of the bucket.

import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('bucket')
for obj in bucket.objects.all():
    if '.pdf' in obj.key:
        print(obj.key)

Upvotes: 7

Related Questions