Reputation: 8316
Can i somehow search objects in S3 by extension, not only by prefix?
Here is what i have now:
ListObjectsResponse r = s3Client.ListObjects(new Amazon.S3.Model.ListObjectsRequest()
{
BucketName = BucketName,
Marker = marker,
Prefix = folder,
MaxKeys = 1000
});
So, I need to list all *.xls files in my bucket.
Upvotes: 25
Views: 46885
Reputation: 5678
I always use the following approach to search objects by suffix which is based on fetching a list of all the objects in the bucket under the specified path (recursively; including sub directories) and then filtering out on the basis of suffix provided:
aws s3 ls s3://[BUCKET_NAME]/[DIRECTORY_NAME]/ --recursive | grep "[SUFFIX]"
Depending upon the above approach, I implemented a similar solution in the desired programming language.
Looking at the code you mentioned in the question, it seems like it is C# (.NET)
. So, in your case, the solution will be as follows:
var request = new ListObjectsRequest
{
BucketName = [BUCKET_NAME],
Prefix = [DIRECTORY_NAME],
MaxKeys = 1000
};
ListObjectsResponse response;
string marker = null;
do
{
request.Marker = marker;
response = s3Client.ListObjects(request);
var filteredObjects = response.S3Objects
.FindAll(obj => obj.Key.EndsWith("[SUFFIX]"));
foreach (var obj in filteredObjects)
{
Console.WriteLine($"Object Key: {obj.Key}, Size: {obj.Size}");
}
marker = response.NextMarker;
} while (response.IsTruncated);
Note: Don't forget to replace [BUCKET_NAME]
, [DIRECTORY_NAME]
, and [SUFFIX]
in the code snippets mentioned above. Also, it is assumed that all the necessary libraries for AWS SDK have been imported and the AWS S3 client already exists in the code.
Upvotes: 1
Reputation: 43
I believe it could help someone.
I did find a way to do that using the ends_with JMESPath function.
In my case specifically, I was trying to get the prefix and suffix at same time, this worked for me:
aws s3api list-objects --bucket my-bucket --query "Contents[?ends_with(Key, 'my-suffix')]" --prefix "my-prefix"
If you only need the suffix:
aws s3api list-objects --bucket my-bucket --query "Contents[?ends_with(Key, 'my-suffix')].Key"
For some reason, if the --prefix
flag isn't set and if "my-suffix" value have a number (e.g. "mp4", "mp3") it won't work. But if the suffix value don't have a number (e.g. "json", "sh", "txt") OR the --prefix
flag is seted it will works fine.
Upvotes: 4
Reputation: 191
if you're simply searching you can probably find them by using a combination of awscli
and grep
as follows:
aws s3 ls s3://<your-bucket-name> --recursive | grep <your-file-extension>
Upvotes: 0
Reputation: 21
You can easily list all the elements by extension, getting all the elements (including folders) and then filtering by key.endswith('...')
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('your-route')
# Data from S3 is also filtered by endswith from key property
for _ in bucket.objects.filter(Prefix=test_dir):
if _.key.endswith('.zicu'):
print('Value of object: ', _.key)
In this case I'm filtering each element with a Prefix (test_dir) and then showing just the elements with .zicu extension
Upvotes: -1
Reputation: 59
Because by using boto3 resource to get objects from S3, you can get satisfied result by using the returned file extension to filter what you want. Like this:
import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('my_bucket')
files = my_bucket.objects.all()
file_list = []
for file in files:
if file.key.endswith('.docx'):
file_list.append(file.key)
You can change the endswith string with what you want.
Upvotes: 3
Reputation: 3863
I'm iterating after fetching the file information. End result will be in dict
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('bucket_name')
#get all files information from buket
files = bucket.objects.all()
# create empty list for final information
files_information = []
# your known extensions list. we will compare file names with this list
extensions = ['png', 'jpg', 'txt', 'docx']
# Iterate throgh 'files', convert to dict. and add extension key.
for file in files:
if file.key[-3:] in extensions:
files_information.append({'file_name' : file.key, 'extension' : file.key[-3:]})
else:
files_information.append({'file_name' : file.key, 'extension' : 'unknown'})
print files_information
Upvotes: 2
Reputation: 4839
While I do think the BEST answer is to use a database to keep track of your files for you, I also think its an incredible pain in the ass. I was working within python with boto3, and this is the solution I came up with.
It's not elegant, but it will work. List all the files, and then filter it down to a list of the ones with the "suffix"/"extension" that you want in code.
s3_client = boto3.client('s3')
bucket = 'my-bucket'
prefix = 'my-prefix/foo/bar'
paginator = s3_client.get_paginator('list_objects_v2')
response_iterator = paginator.paginate(Bucket=bucket, Prefix=prefix)
file_names = []
for response in response_iterator:
for object_data in response['Contents']:
key = object_data['Key']
if key.endswith('.json'):
file_names.append(key)
print file_names
Upvotes: 22
Reputation: 2068
You don't actually need a separate database to do this for you.
S3 gives you the ability to list objects in a bucket with a certain prefix. Your dilemma is that the ".xls" extension is at the end of the file name, therefore, prefix search doesn't help you. However, when you put the file into the bucket, you can change the object name so that the prefix contains the file type (for example: XLS-myfile.xls). Then, you can use the S3 API listObjects and pass a prefix of "XLS".
Upvotes: 6
Reputation: 18832
I don't believe this is possible with S3.
The best solution is to 'index' S3 using a database (Sql Server, MySql, SimpleDB etc) and do your queries against that.
Upvotes: 18