magdazelena
magdazelena

Reputation: 71

Delete files from s3 bucket based on names listed in file using cli

I'm trying to delete multiple (like: thousands) of files from Amazon S3 bucket. I have a file names listed in a file like so:

name1.jpg
name2.jpg
...
name2020201.jpg

I tried following solution:

aws s3 rm s3://test-bucket --recursive --exclude "*" --include "data/*.*" 

from this question but --include only takes one arg. I tried to get hacky and list names like --include "name1.jpg" but this does not work either.

This approach does not work as well:

aws s3 rm s3://test-bucket < file.txt

Can you help?

Upvotes: 1

Views: 2251

Answers (2)

magdazelena
magdazelena

Reputation: 71

The following approach is actually much faster since my first answer took ages to complete.

My first approach was to delete one line at a time using rm command. This is not efficient. After around 15h (!) it deleted only around 40.000 records, which was 1/5 of total.

This approach by Norbert Preining is waaay faster. As he explains, it uses s3api method called delete-objects which can bulk delete objects in storage. This method takes a json object as an argument. To parse list of file names into JSON object required, this script uses JSON preprocessor called jq (read more here). The script takes 500 records per iteration.

cat file-with-names |  while mapfile -t -n 500 ary && ((${#ary[@]})); do
        objdef=$(printf '%s\n' "${ary[@]}" | ./jq-win64.exe -nR '{Objects: (reduce inputs as $line ([]; . + [{"Key":$line}]))}')
        aws s3api --no-cli-pager  delete-objects --bucket BUKET --delete "$objdef"
done

Upvotes: 0

magdazelena
magdazelena

Reputation: 71

I figured this out with this simple bash script:

#!/bin/bash  
set -e  
while read line  
do  
   aws s3 rm s3://test-bucket/$line
done <files.txt

Inspired by this answer Answer is: delete one at a time!

Upvotes: 1

Related Questions