Reputation: 5200
I have an S3 bucket that I'm saving CSV files for loading them into Redshift. I'm using Python and Boto3 for this. After loading them into Redshift I want to delete specific files that match a pattern that contains the processing ID for my code.
I'm saving my files into S3 bucket as follows
Redshift{processingID}-table1.csv
Redshift{processingID}-table2.csv
Redshift{processingID}-table3.csv
Redshift{processingID}-table4.csv
After processing those files that contains specific ID, I want to delete the processed files from my S3 bucket. How do I specify the pattern.
This is the pattern that I'm trying to delete the files from bucket.
Redshift11-*.csv
. Here 11 is the processingID. How do I delete all the files that matches the pattern using boto3?
I've come across this. https://stackoverflow.com/a/53836093/4626254
But it seems like it is searching for the folder as prefix and not the exact pattern of the file.
Upvotes: 1
Views: 4606
Reputation: 1
There's no way to tell S3 to delete files that meet a specific pattern - you just have to delete one file at a time. You can list keys with a specific prefix
ex: Redshift
or application_name_used_as_prefix
) by modifying your file naming to have a unique prefix.
Or if you need to rely on regex, then you must specify start and end rules like:
import re
pattern = r"Redshift([0-9]+)-(\w+).csv$"
re.match(pattern, 'Redshift2-table1.csv')
Hope this helps!
Upvotes: 0
Reputation: 78850
You can do prefix filtering server-side but you'll have to do suffix-filtering client-side. For example:
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('mybucket')
files = [os.key for os in bucket.objects.filter(Prefix="myfolder/Redshift11-")]
csv_files = [file for file in files if file.endswith('.csv')]
print(f'All files: {files}')
print(f'CSV files: {csv_files}')
Upvotes: 4