Reputation: 358
This is more of a concept clarification. I can find the actual counts using Boto3 via repeated queries using the LastEvaluatedKey of previous response.
I want to count items matching certain conditions in dynamoDb. I am using the "select = count", which according to the docs [1] should just return count of matched items, and my assumption that the response will not be paginated.
COUNT - Returns the number of matching items, rather than the matching items themselves.
When i try it via aws-cli, my assumptions seems correct, (like the rest api samples in the doc [1])
aws dynamodb query \
--table-name 'my-table' \
--index-name 'classification-date-index' \
--key-condition-expression 'classification = :col AND #dt BETWEEN :start AND :end' \
--expression-attribute-values '{":col" : {"S":"INTERNAL"}, ":start" : {"S": "2020-04-10"}, ":end" : {"S": "2020-04-25"}}' \
--expression-attribute-names '{"#dt" : "date"}' \
--select 'COUNT'
"Count": 18817,
"ScannedCount": 18817,
"ConsumedCapacity": null
But when I try using Python3 and Boto3, the response is paginated, and I have to repeat the query till LastEvaluatedKey is empty.
In [22]: table.query(IndexName='classification-date-index', Select='COUNT', KeyConditionExpression= Key('classification').eq('INTERNAL') & Key('date').between('2020-04-10', '2020-04-25'))
{'Count': 5667,
'ScannedCount': 5667,
'LastEvaluatedKey': {'classification': 'INTERNAL',
'date': '2020-04-14',
's3Path': '<redacted>'},
'ResponseMetadata': {'RequestId': 'TH3ILO0P47QB7GAU9M3M98BKJVVV4KQNSO5AEMVJF66Q9ASUAAJG',
'HTTPStatusCode': 200,
'HTTPHeaders': {'server': 'Server',
'date': 'Sat, 25 Apr 2020 13:32:36 GMT',
'content-type': 'application/x-amz-json-1.0',
'content-length': '230',
'connection': 'keep-alive',
'x-amz-crc32': '133035383'},
'RetryAttempts': 0}}
I expected the same behaviour from the Boto3 sdk like the aws cli, as the response seems lesser than the 1mb. The docs are slightly conflicting ...
"Paginating Table Query Results" [2] page says :
DynamoDB paginates the results from Query operations. With pagination, the Query results are divided into "pages" of data that are 1 MB in size (or less). An application can process the first page of results, then the second page, and so on. A single Query only returns a result set that fits within the 1 MB size limit.
While the "Query" [1] page says:
A single Query operation will read up to the maximum number of items set (if using the Limit parameter) or a maximum of 1 MB of data and then apply any filtering to the results using FilterExpression.
Upvotes: 2
Views: 2616
Reputation: 46
Just ran down this issue myself. The AWS CLI does automatic summation of the pages from the DynamoDB query. To stop it from doing this, add --no-paginate
onto your command as listed on this page
Upvotes: 3