Robert
Robert

Reputation: 3483

python DynamoDB scan operation not return all records

In DynamoDB table, i have 161712 records when apply with out any filters,i have received scan count value only 10589

This is mytable meta

{
  "AttributeDefinitions": [
    {
      "AttributeName": "question_id",
      "AttributeType": "N"
    },
    {
      "AttributeName": "timestamp",
      "AttributeType": "S"
    }
  ],
  "TableName": "users_answers",
  "KeySchema": [
    {
      "AttributeName": "timestamp",
      "KeyType": "HASH"
    },
    {
      "AttributeName": "question_id",
      "KeyType": "RANGE"
    }
  ],
  "TableStatus": "ACTIVE",
  "CreationDateTime": "2017-09-12T12:33:22.615Z",
  "ProvisionedThroughput": {
    "LastIncreaseDateTime": "2017-09-12T16:46:26.742Z",
    "NumberOfDecreasesToday": 0,
    "ReadCapacityUnits": 80,
    "WriteCapacityUnits": 80
  },
  "TableSizeBytes": 16014441,
  "ItemCount": 161712
}

when i do scan operation above table will get only 10589 records

table = dynamo.get_table('answer_options')
x    = table.scan()

Please suggest how i fetch entire records from table

Env: python 3.5.1 , flask dynamodb

Thanks in advance

Upvotes: 1

Views: 4667

Answers (1)

Noel Llevares
Noel Llevares

Reputation: 16037

DynamoDB only returns 1MB per request. You have to loop through and make multiple requests until you get your entire dataset.

From DynamoDB docs:

DynamoDB paginates the results from Scan operations. With pagination, the Scan results are divided into "pages" of data that are 1 MB in size (or less). An application can process the first page of results, then the second page, and so on.

A single Scan will only return a result set that fits within the 1 MB size limit. To determine whether there are more results, and to retrieve them one page at a time, applications should do the following:

  1. Examine the low-level Scan result:

    • If the result contains a LastEvaluatedKey element, proceed to step 2.
    • If there is not a LastEvaluatedKey in the result, then there are no more items to be retrieved.
  2. Construct a new Scan request, with the same parameters as the previous one—but this time, take the LastEvaluatedKey value from step 1 and use it as the ExclusiveStartKey parameter in the new Scan request.

  3. Run the new Scan request.

  4. Go to step 1.

Upvotes: 8

Related Questions