KidCrippler
KidCrippler

Reputation: 1723

AWS S3 deletion of files that haven't been accessed

I'm writing a service that takes screenshots of a lot of URLs and saves them in a public S3 bucket.
Due to storage costs, I'd like to periodically purge the aforementioned bucket and delete every screenshot that hasn't been accessed in the last X days.
By "accessed" I mean downloaded or acquired via a GET request.

I checked out the documentation and found a lot of ways to define an expiration policy for an S3 object, but couldn't find a way to "mark" a file as read once it's been accessed externally.

Is there a way to define the periodic purge without code (only AWS rules/services)? Does the API even allow that or do I need to start implementing external workarounds?

Upvotes: 3

Views: 1587

Answers (3)

Eligio Mariño
Eligio Mariño

Reputation: 372

You can use this solution mentioned in an AWS blog. It's based on S3 Inventory and S3 server access logs. In summary, a Lambda will run an Athena query to select the objects that need to be deleted.

Here is the architecture flow:

  1. The S3 server access logs capture S3 object requests. These are generated and stored in the target S3 bucket.
  2. An S3 inventory report is generated for the source bucket daily. It is written to the S3 inventory target bucket.
  3. An Amazon EventBridge rule is configured that will initiate an AWS Lambda function once a day, or as desired.
  4. The Lambda function initiates an S3 Batch Operation job to tag objects in the source bucket. These must be expired using the following logic:
    • Capture the number of days (x) configuration from the S3 Lifecycle configuration.
    • Run an Amazon Athena query that will get the list of objects from the S3 inventory report and server access logs. Create a delta list with objects that were created earlier than ‘x’ days, but not accessed during that time.
    • Write a manifest file with the list of these objects to an S3 bucket.
    • Create an S3 Batch operation job that will tag all objects in the manifest file with a tag of “delete=True”.
  5. The Lifecycle rule on the source S3 bucket will expire all objects that were created prior to ‘x’ days. They will have the tag given via the S3 batch operation of “delete=True”

Object expiry architecture flow https://aws.amazon.com/blogs/architecture/expiring-amazon-s3-objects-based-on-last-accessed-date-to-decrease-costs/

Upvotes: 0

John Rotenstein
John Rotenstein

Reputation: 270224

You can use Amazon S3 Storage Class Analysis:

By using Amazon S3 analytics storage class analysis you can analyze storage access patterns to help you decide when to transition the right data to the right storage class. This new Amazon S3 analytics feature observes data access patterns to help you determine when to transition less frequently accessed STANDARD storage to the STANDARD_IA (IA, for infrequent access) storage class.

After storage class analysis observes the infrequent access patterns of a filtered set of data over a period of time, you can use the analysis results to help you improve your lifecycle policies.

Even if you don't use it to change Storage Class, you can use it to discover which objects are not accessed frequently.

Upvotes: 2

Deepak Singhal
Deepak Singhal

Reputation: 10876

There is no such service provided by AWS.. You will have to write your own solution.

Upvotes: 0

Related Questions