Reputation: 91
Is there any metric for when to choose a full table scan over GSI or the other way around?
I know the basic concept behind both but the pricing model for GSI is very dependent on the table itself that i'm having hard time deciding
and more importantly, how would that scale with different table sizes, or how much scanning is too inefficient and requiring a GSI instead
By the by, I'm having a hard time finding good resources for filtering expressions for the query and scan on dynamodb, any good recommendations? ("#v >= :num" this is what i mean, probably not searching with the correct term)
Upvotes: 2
Views: 1144
Reputation: 55720
In general the decision to use a query versus a scan comes down to how much of the data you need. If the answer is that you need most of the data (which in practice can only really be case for relatively small tables) then use a scan. Otherwise, use a query — pretty much every time.
It’s impossible to give a hard threshold for what ‘most of the data’ means. I’d say definitely more than 50% and that this threshold tends to 100% as the table size grows.
The exception to the above would be one-off operations that can be performed in the background and where you you’re willing to trade time for cost. And the corollary, that if you are getting data for a customer facing request your aim should be to read as little from the database as possible to keep request times short.
All this being said, parallel scans can be super quick to pull in a lot of data, if you really need it and you are in a position to consume extra capacity. Even on largish tables, as long as you have the capacity to spare you can pull in hundreds of thousands of items in just a few seconds.
Upvotes: 3