Psychozoic
Psychozoic

Reputation: 607

How to find un encrypted file in Amazon AWS S3 bucket?

What i have: several old s3 buckets with 1M objects in each, with server-side encryption turned on.

Problem: old files are unencrypted. And i can't say when encryption was turned on. So, i need to find all unencrypted files.

I've tried solution with awscli, but it is pretty slow - 1 request in 2 seconds.

my solution:

s3_buckets="uploads tmp logs whatever "
for s3_bucket in $s3_buckets;
do
    aws s3 ls s3://$s3_bucket --recursive \
    | awk '{print $NF}' \
    | ( while read object ; 
        do 
            object_status=$(aws s3api head-object --bucket $s3_bucket --key $object --query ServerSideEncryption --output text 2>&1) 
            if [ "$object_status" != "AES256" ]; then
                echo "Unencrypted object $object in s3://$s3_bucket"; >> /tmp/body.tmp
                objects_unencrypted=$((objects_unencrypted + 1))
            fi
            objects_count=$((objects_count + 1))
        done
    echo "Bucket $s3_bucket has $objects_count, where unencrypted $objects_unencrypted." >> /tmp/body.tmp )
done

so, maybe there are any better solutions?

is it possible to create Cloudwatch metric to show unencrypted fiels? or any others?

Upvotes: 8

Views: 4850

Answers (2)

jarmod
jarmod

Reputation: 78563

Use Amazon S3 Inventory.

The inventory list contains a list of the objects in an S3 bucket and the metadata for each listed object includes, among other things:

  • Encryption status – Set to SSE-S3, SSE-C, SSE-KMS, or NOT-SSE. The server-side encryption status for SSE-S3, SSE-KMS, and SSE with customer-provided keys (SSE-C). A status of NOT-SSE means that the object is not encrypted with server-side encryption.

Upvotes: 8

Sébastien Stormacq
Sébastien Stormacq

Reputation: 14905

I am afraid there is no better solution than listing all the files and checking for encryption one by one. (see also https://github.com/aws/aws-sdk-js/issues/1778)

There is no cloudwatch metric about encryption. The list of metrics is given at https://docs.aws.amazon.com/AmazonS3/latest/dev/cloudwatch-monitoring.html

That being said, you can speed up the process a bit by writing a python or node script to do that for you. It will run faster than your shell script above because it does not need to spawn a process (and the full python runtime) for each object.

The URL above gives an example in Node. Same applies to Python.

Another way to speed up the process is to run multiple scripts in parallel, each handling part of your name space. Assuming your object key names are distributed evenly (first letter of the object key is [a-z] and your object key's first letter distribution is uniform) : you can create 26 scripts, each listing one letter (all keys starting with 'a', all keys starting with 'b' etc) and run these scripts in parallel. This will take advantage of the massive parallelism of S3. The letter example can be replaced with whatever is more appropriate for your use case.

To minimize latency between your script and S3, I would run the script on a small EC2 instance running in the same region as your bucket.

Upvotes: 2

Related Questions