Reputation: 607
What i have: several old s3 buckets with 1M objects in each, with server-side encryption turned on.
Problem: old files are unencrypted. And i can't say when encryption was turned on. So, i need to find all unencrypted files.
I've tried solution with awscli
, but it is pretty slow - 1 request in 2 seconds.
my solution:
s3_buckets="uploads tmp logs whatever "
for s3_bucket in $s3_buckets;
do
aws s3 ls s3://$s3_bucket --recursive \
| awk '{print $NF}' \
| ( while read object ;
do
object_status=$(aws s3api head-object --bucket $s3_bucket --key $object --query ServerSideEncryption --output text 2>&1)
if [ "$object_status" != "AES256" ]; then
echo "Unencrypted object $object in s3://$s3_bucket"; >> /tmp/body.tmp
objects_unencrypted=$((objects_unencrypted + 1))
fi
objects_count=$((objects_count + 1))
done
echo "Bucket $s3_bucket has $objects_count, where unencrypted $objects_unencrypted." >> /tmp/body.tmp )
done
so, maybe there are any better solutions?
is it possible to create Cloudwatch
metric to show unencrypted fiels? or any others?
Upvotes: 8
Views: 4850
Reputation: 78563
Use Amazon S3 Inventory.
The inventory list contains a list of the objects in an S3 bucket and the metadata for each listed object includes, among other things:
Upvotes: 8
Reputation: 14905
I am afraid there is no better solution than listing all the files and checking for encryption one by one. (see also https://github.com/aws/aws-sdk-js/issues/1778)
There is no cloudwatch metric about encryption. The list of metrics is given at https://docs.aws.amazon.com/AmazonS3/latest/dev/cloudwatch-monitoring.html
That being said, you can speed up the process a bit by writing a python or node script to do that for you. It will run faster than your shell script above because it does not need to spawn a process (and the full python runtime) for each object.
The URL above gives an example in Node. Same applies to Python.
Another way to speed up the process is to run multiple scripts in parallel, each handling part of your name space. Assuming your object key names are distributed evenly (first letter of the object key is [a-z] and your object key's first letter distribution is uniform) : you can create 26 scripts, each listing one letter (all keys starting with 'a', all keys starting with 'b' etc) and run these scripts in parallel. This will take advantage of the massive parallelism of S3. The letter example can be replaced with whatever is more appropriate for your use case.
To minimize latency between your script and S3, I would run the script on a small EC2 instance running in the same region as your bucket.
Upvotes: 2