user2589299
user2589299

Reputation: 129

DynamoDB index/query questions

I need to create a table with the following fields : place, date, status

Table has approximately 300k rows per day and about 3 days worth of data at any given time, so about 1 million rows. I have a service that is continuously populating data to this DDB. I need to run the following queries (only) once per day :

#1 Return count of all places with date = current_date-1 #2 Return count and list of all places with date= current_date-1 and status = 0

Questions :

  1. As date is already a sort key, is query #1 bound to be quick?
  2. Do we need to create indexes on sort key fields ?
  3. If answer to above question is yes: for query #2, do I need to create a GSI on date and status? with date as Partition key, and status as sort key?
  4. Creating a GSI vs using filter expression on status for query #2. Which of the two is recommended?

Upvotes: 0

Views: 238

Answers (2)

hunterhacker
hunterhacker

Reputation: 7132

Easiest thing for you to do is a full table scan once per day filtering by yesterday's date, and as part of that keep your own client-side count on if the status was 0 or 1. The filter is not index optimized so it will be a true full table scan.

Why not an export to S3? Because you're really just doing one query. If you follow the export route you'll have to a new export every day to keep the data fresh and the cost of the export in dollar terms (plus complexity) is more than a single full scan. If you were going to do repeated queries against the data then the export makes more sense.

Why not use GSIs? They would make the table scan more efficient by minimizing what's scanned. However, there's a cost (plus complexity) in keeping them current.

Short answer: a once per day full table scan is both simple to implement and as fast as you want (parallel scan is an option), plus it's not really costly.

How much would it cost? Million rows, 100 bytes each, so that's a 100 MB table. That's 25,000 read units to fully scan, which is halved down to 12,500 with eventual consistency. On Demand pricing is $0.25 per million read units. 12,500 / 1,000,000 * $0.25 = $0.003. Less than a dime a month. It'd be cheaper still if you run provisioned.

Just do the scan. :)

Upvotes: 0

Guy
Guy

Reputation: 12891

Running analytical queries (such as count) is a wrong usage of a NoSQL database such as DynamoDB that is designed for scalable LOOKUP use cases.

Even if you get the SCAN to work with one design or another, it will be more expensive and slow than it should.

A better option is to export the table data from DynamoDB into S3, and then run an Athena query over that data. It will be much more flexible to run various analytical queries.

Upvotes: 1

Related Questions