Delete files from s3 bucket based on names listed in file using cli

Question

I'm trying to delete multiple (like: thousands) of files from Amazon S3 bucket. I have a file names listed in a file like so:

name1.jpg
name2.jpg
...
name2020201.jpg

I tried following solution:

aws s3 rm s3://test-bucket --recursive --exclude "*" --include "data/*.*"

from this question but --include only takes one arg. I tried to get hacky and list names like --include "name1.jpg" but this does not work either.

This approach does not work as well:

aws s3 rm s3://test-bucket < file.txt

Can you help?

magdazelena · Accepted Answer

The following approach is actually much faster since my first answer took ages to complete.

My first approach was to delete one line at a time using rm command. This is not efficient. After around 15h (!) it deleted only around 40.000 records, which was 1/5 of total.

This approach by Norbert Preining is waaay faster. As he explains, it uses s3api method called delete-objects which can bulk delete objects in storage. This method takes a json object as an argument. To parse list of file names into JSON object required, this script uses JSON preprocessor called jq (read more here). The script takes 500 records per iteration.

cat file-with-names |  while mapfile -t -n 500 ary && ((${#ary[@]})); do
        objdef=$(printf '%s
' "${ary[@]}" | ./jq-win64.exe -nR '{Objects: (reduce inputs as $line ([]; . + [{"Key":$line}]))}')
        aws s3api --no-cli-pager  delete-objects --bucket BUKET --delete "$objdef"
done

Delete files from s3 bucket based on names listed in file using cli

Answers (2)

Related Questions