Underoos
Underoos

Reputation: 5200

How to delete files that matches a specific pattern in S3 bucket?

I have an S3 bucket that I'm saving CSV files for loading them into Redshift. I'm using Python and Boto3 for this. After loading them into Redshift I want to delete specific files that match a pattern that contains the processing ID for my code.

I'm saving my files into S3 bucket as follows

Redshift{processingID}-table1.csv
Redshift{processingID}-table2.csv
Redshift{processingID}-table3.csv
Redshift{processingID}-table4.csv

After processing those files that contains specific ID, I want to delete the processed files from my S3 bucket. How do I specify the pattern.

This is the pattern that I'm trying to delete the files from bucket.

Redshift11-*.csv. Here 11 is the processingID. How do I delete all the files that matches the pattern using boto3?

I've come across this. https://stackoverflow.com/a/53836093/4626254

But it seems like it is searching for the folder as prefix and not the exact pattern of the file.

Upvotes: 1

Views: 4606

Answers (2)

pyjamaboy
pyjamaboy

Reputation: 1

There's no way to tell S3 to delete files that meet a specific pattern - you just have to delete one file at a time. You can list keys with a specific prefix

ex: Redshift or application_name_used_as_prefix) by modifying your file naming to have a unique prefix.

Or if you need to rely on regex, then you must specify start and end rules like:

import re

pattern = r"Redshift([0-9]+)-(\w+).csv$"
re.match(pattern, 'Redshift2-table1.csv')

Hope this helps!

Upvotes: 0

jarmod
jarmod

Reputation: 78850

You can do prefix filtering server-side but you'll have to do suffix-filtering client-side. For example:

import boto3
s3 = boto3.resource('s3')

bucket = s3.Bucket('mybucket')
files = [os.key for os in bucket.objects.filter(Prefix="myfolder/Redshift11-")]
csv_files = [file for file in files if file.endswith('.csv')]

print(f'All files: {files}')
print(f'CSV files: {csv_files}')

Upvotes: 4

Related Questions