Reputation: 4230
I came across AWS datapipeline template to backup data to S3. However, I do not want to backup the whole table. I just want to keep snapshot of changes that happened in the last 7 days.
I think the way to approach this is to have a GSI on my table on last_updated_date column to scan for records that changed. Now, Is it possible to use AWS Datapipelines to achieve the result?
Upvotes: 2
Views: 602
Reputation: 410
What you are trying to do is very similar to the example provided for the HiveCopyActivity. The example copies data between two DynamoDB tables. You would need to make a couple changes:
output
with an S3DataNode pointing to the bucket where you want to backups to be saved.Change the filterSql
to pull the last 7 days of data, something like:
"filterSql" : "last_updated_date > unix_timestamp(\"#{minusDays(@scheduledStartTime,7)}\", \"yyyy-MM-dd'T'HH:mm:ss\")"
Upvotes: 1
Reputation: 6413
Unless this is just one time task for you, I recommend utilising DynamoDB Streams and Kinesis or Lambda to backup changes to a durable storage. DynamoDB Streams captures a time-ordered sequence of item-level modifications in any DynamoDB table, and stores this information in a log for up to 24 hours. You can trigger a Lambda function in combination with DynamoDB Streams and make it write changes to S3 and achieve near realtime, continuous backup.
Using GSI you can of course make lookups faster, but you will need a lot of provisioned throughput capacity for GSI and the table itself for a task processing a large table.
You can find relevant AWS documentation about Streams below:
1. Capturing Table Activity with DynamoDB Streams
2. Using the DynamoDB Streams Kinesis Adapter to Process Stream Records
There's also a nice blog post about it with examples:
DynamoDB Update – Triggers (Streams + Lambda) + Cross-Region Replication App
Hope this helps!
Upvotes: 2