Scan vs Parallel Scan in AWS DynamoDB?

Question

In Cloud storage system, AWS is highly demanded. Scan process need more faster. So how the scan process works and which one(Scan/Parallel Scan) is better in in which situation?

How scan works in AWS DynamoDB?
How parallel scan works in AWS DynamoDB?
Scan vs Parallel Scan in AWS DyanmoDB?
When Parallel Scan will be preferred?
Is filter expression is applied before scan?

SkyWalker · Accepted Answer

1. How scan works in AWS DynamoDB?

Ans:

i) Scan operation returns one or more items.

ii) By default, Scan operations proceed sequentially.

iii) By default, Scan uses eventually consistent reads when accessing the data in a table.

iv) If the total number of scanned items exceeds the maximum data set size limit of 1 MB, the scan stops and results are returned to the user as a LastEvaluatedKey value to continue the scan in a subsequent operation.

v) A Scan operation performs eventually consistent reads by default, and it can return up to 1 MB (one page) of data. Therefore, a single Scan request can consume

(1 MB page size / 4 KB item size) / 2 (eventually consistent reads) = 128 read operations.

2. How parallel scan works in AWS DynamoDB?

Ans:

i) For faster performance on a large table or secondary index, applications can request a parallel Scan operation.

ii) You can run multiple worker threads or processes in parallel. Each worker will be able to scan a separate segment of a table concurently with the other workers. DynamoDB’s Scan function now accepts two additional parameters:

TotalSegments denotes the number of workers that will access the table concurrently.
Segment denotes the segment of table to be accessed by the calling worker.

iii) The two parameters, when used together, limit the scan to a particular block of items in the table. You can also use the existing Limit parameter to control how much data is returned by an individual Scan request.

3. Scan vs Parallel Scan in AWS DyanmoDB?

Ans:

i) A Scan operation can only read one partition at a time. So parallel scan is needed for faster read on multiple partition at a time.

ii) A sequential Scan might not always be able to fully utilize the provisioned read throughput capacity. So parallel scan is needed there.

iii) Parallel Scans, reduce your costs by up to 4x for certain types of queries and scans.

4. When Parallel Scan will be preferred?

Ans:

A parallel scan can be the right choice if the following conditions are met:

The table size is 20 GB or larger.
The table's provisioned read throughput is not being fully utilized.
Sequential Scan operations are too slow.

5. Is filter expression is applied before scan?

Ans: No, A FilterExpression is applied after the items have already been read; the process of filtering does not consume any additional read capacity units.

Scan vs Parallel Scan in AWS DynamoDB?

Answers (2)

Resource Link:

Related Questions