sas1138
sas1138

Reputation: 358

Boto3 DynamoDb Query with Select Count without pagination

This is more of a concept clarification. I can find the actual counts using Boto3 via repeated queries using the LastEvaluatedKey of previous response.

I want to count items matching certain conditions in dynamoDb. I am using the "select = count", which according to the docs [1] should just return count of matched items, and my assumption that the response will not be paginated.

COUNT - Returns the number of matching items, rather than the matching items themselves.

When i try it via aws-cli, my assumptions seems correct, (like the rest api samples in the doc [1])

    aws dynamodb query \
    --table-name 'my-table' \
    --index-name 'classification-date-index' \
    --key-condition-expression 'classification = :col AND #dt BETWEEN :start AND :end' \
    --expression-attribute-values '{":col" : {"S":"INTERNAL"}, ":start" : {"S": "2020-04-10"}, ":end" : {"S": "2020-04-25"}}' \
    --expression-attribute-names '{"#dt" : "date"}' \
    --select 'COUNT'
 {
      "Count": 18817,
      "ScannedCount": 18817,
      "ConsumedCapacity": null
  }

But when I try using Python3 and Boto3, the response is paginated, and I have to repeat the query till LastEvaluatedKey is empty.

In [22]: table.query(IndexName='classification-date-index', Select='COUNT', KeyConditionExpression= Key('classification').eq('INTERNAL') & Key('date').between('2020-04-10', '2020-04-25'))

Out[22]:
{'Count': 5667,
 'ScannedCount': 5667,
 'LastEvaluatedKey': {'classification': 'INTERNAL',
  'date': '2020-04-14',
  's3Path': '<redacted>'},
 'ResponseMetadata': {'RequestId': 'TH3ILO0P47QB7GAU9M3M98BKJVVV4KQNSO5AEMVJF66Q9ASUAAJG',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'server': 'Server',
   'date': 'Sat, 25 Apr 2020 13:32:36 GMT',
   'content-type': 'application/x-amz-json-1.0',
   'content-length': '230',
   'connection': 'keep-alive',
   'x-amzn-requestid': 'TH3ILO0P47QB7GAU9M3M98BKJVVV4KQNSO5AEMVJF66Q9ASUAAJG',
   'x-amz-crc32': '133035383'},
  'RetryAttempts': 0}}

I expected the same behaviour from the Boto3 sdk like the aws cli, as the response seems lesser than the 1mb. The docs are slightly conflicting ...

"Paginating Table Query Results" [2] page says :

DynamoDB paginates the results from Query operations. With pagination, the Query results are divided into "pages" of data that are 1 MB in size (or less). An application can process the first page of results, then the second page, and so on. A single Query only returns a result set that fits within the 1 MB size limit.

While the "Query" [1] page says:

A single Query operation will read up to the maximum number of items set (if using the Limit parameter) or a maximum of 1 MB of data and then apply any filtering to the results using FilterExpression.

[1] https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html

[2] https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.Pagination.html

Upvotes: 2

Views: 2595

Answers (1)

Violet Olson
Violet Olson

Reputation: 46

Just ran down this issue myself. The AWS CLI does automatic summation of the pages from the DynamoDB query. To stop it from doing this, add --no-paginate onto your command as listed on this page

Upvotes: 3

Related Questions