Toby Cannon
Toby Cannon

Reputation: 735

Alternative to scanning AWS DynamoDB?

I understand that scanning DynamoDB is not reccomended and is bad practice.

Let's say I have a food ordering website and I want to do a daily scan of all users to find out who hasn't ordered food in the last week so I can send them an email (just an example).

This would put some very spikey demand on the database, especially with a large user base.

Is there an alternative to these scheduled scans that I'm missing? Or in this scenario is a scan the best tool for the job?

Upvotes: 2

Views: 1349

Answers (1)

Jens
Jens

Reputation: 21530

There are a lot of different possible answers to this question. As so often all of this begins with the simple truth that the best way to do something like this, depends on the actual specifics and what you are trying to optimise for (cost, latency, duration etc.).

Since this appears to be a "once a week" thing I guess latency and "job" duration are not high on the priority list, but cost might be.

The next important thing to consider is implementation complexity. For example: if your service only has 100 users, I would not bother with any of the more complex solutions and just do a scan. But if your service has millions of users, this is probably not a great idea anymore.

For the purpose of this answer I am going to assume that your user base has become too large to just do a scan. In this scenario I can think of two possible solutions:

  1. Add a separate index that allows you to "query" for the last order date easily.
  2. Use a S3 backup

The first should be fairly self explanatory. As often described in DynamoDB articles, you are supposed to define your "access patterns" and build indexes around them. The pro here is that you are still operating within DynamoDB, the con is the added cost.

My preferred solution is probably to just do scheduled backups of the table to S3 and then process the backup somewhere else. Maybe a custom tool you write or some AWS service that allows processing large amounts of data. This is probably the cheapest solution but processing time might not be "super fast".

I am looking forward to other solutions to this interesting question.

Upvotes: 6

Related Questions