Reputation: 71
I'm trying to delete multiple (like: thousands) of files from Amazon S3 bucket. I have a file names listed in a file like so:
name1.jpg
name2.jpg
...
name2020201.jpg
I tried following solution:
aws s3 rm s3://test-bucket --recursive --exclude "*" --include "data/*.*"
from this question but --include only takes one arg.
I tried to get hacky and list names like --include "name1.jpg"
but this does not work either.
This approach does not work as well:
aws s3 rm s3://test-bucket < file.txt
Can you help?
Upvotes: 1
Views: 2251
Reputation: 71
The following approach is actually much faster since my first answer took ages to complete.
My first approach was to delete one line at a time using rm
command. This is not efficient. After around 15h (!) it deleted only around 40.000 records, which was 1/5 of total.
This approach by Norbert Preining is waaay faster. As he explains, it uses s3api method called delete-objects which can bulk delete objects in storage. This method takes a json object as an argument. To parse list of file names into JSON object required, this script uses JSON preprocessor called jq (read more here). The script takes 500 records per iteration.
cat file-with-names | while mapfile -t -n 500 ary && ((${#ary[@]})); do
objdef=$(printf '%s\n' "${ary[@]}" | ./jq-win64.exe -nR '{Objects: (reduce inputs as $line ([]; . + [{"Key":$line}]))}')
aws s3api --no-cli-pager delete-objects --bucket BUKET --delete "$objdef"
done
Upvotes: 0
Reputation: 71
I figured this out with this simple bash script:
#!/bin/bash
set -e
while read line
do
aws s3 rm s3://test-bucket/$line
done <files.txt
Inspired by this answer Answer is: delete one at a time!
Upvotes: 1