Reputation: 31

DynamoDB Scan Vs Query on same data

I have a use case where I have to return all elements of a table in Dynamo DB.

Suppose my table has a partition key (Column X) having same value in all rows say "monitor" and sort key (Column Y) with distinct elements.

Will there be any difference in execution time in the below approaches or is it the same?

Scanning whole table.
Querying data based on the partition key having "monitor".

Upvotes: 0

Answers (3)

Itay Maman

Reputation: 30733

Direct answer

To the best of my knowledge, in the specific case you are describing, scan will be marginally slower (esp. in first response). This is when assuming you do not do any filtering (i.e., FilterExpression is empty).

Further thoughts

DynamoDB can potentially store huge amounts of data. By "huge" I mean "more than can fit in any machine's RAM". If you need to 'return all elements of a table' you should ask yourself: what happens if that table grows such that all elements will no longer fit in memory? you do not have to handle this right now (I believe that as of now the table is rather small) but you do need to keep in mind the possibility of going back to this code and fixing it such that it addresses this concern.

questions I would ask myself if I were in your position:

(1) can I somehow set a limit on the number of items I need to read (say, read only the first 1000 items)?

(2) how is this information (the list of items) used? is it sent back to a JS application running inside a browser which displays it to a user? if the answer is yes, then what will the user do with a huge list of items?

(3) can you work on the items one at a time (or 10 or 100 at a time)? if the answer is yes then you only need to store one (or 10 or 100) items in memory but not the entire list of items

In general, in DDB scan operations are used as described in (3): read one item (or several items) at a time, do some processing and then moving on to the next item.

Upvotes: 0

Ankit Deshpande

Reputation: 3604

Avoid using scan as far as possible.

Scan will fetch all the rows from a table, you will have to use pagination also to iterate over all the rows. It is more like a select * from table; sql operation.

Use query if you want to fetch all the rows based on the partition key. If you know which partition key you want the results for, you should use query, because it will kind of use indexes to fetch rows only with the specific partition key

Upvotes: 0

Shaho

Reputation: 478

You should use the parallell scans concept. Basically you're doing multiple scans at once on different segments of the Table. Watch out for higher RCU usage though.

Upvotes: 1

DynamoDB Scan Vs Query on same data

Answers (3)

Direct answer

Further thoughts

Related Questions