Kumar Vivek
Kumar Vivek

Reputation: 381

Amazon DynamoDB scan is not scanning complete table

I are trying to scan and update all entry with specific attribute value in my Amazon DynamoDB table, this will be one time operations and the parameter I am querying is not an index.

If I understood right my only option is to perform a scan of whole Amazon DynamoDB table and whenever that entry is encountered, I should update them.

My table size is around 2 GB and my table has over 8.5 million records.

Below is snippet of my script:

scan_kwargs = {
    'FilterExpression': Key('someKey').eq(sometargetNumber)
}
matched_records = my_table.scan(**scan_kwargs)

print 'Number of records impacted by this operations: ' + str(matched_records['Count'])
user_response = raw_input('Would you like to continue?\n')

if user_response == 'y':
    for item in matched_records['Items']:
        print '\nTarget Record:'
        print(item)
        updated_record = my_table.update_item(
            Key={
                'sessionId': item['attr0']
            },
            UpdateExpression="set att1=:t, att2=:s, att3=:p, att4=:k, att5=:si",
            ExpressionAttributeValues={
                ':t': sourceResponse['Items'][0]['att1'],
                ':s': sourceResponse['Items'][0]['att2'],
                ':p': sourceResponse['Items'][0]['att3'],
                ':k': sourceResponse['Items'][0]['att4'],
                ':si': sourceResponse['Items'][0]['att5']
            },
            ReturnValues="UPDATED_NEW"
        )
        print '\nUpdated Target Record:'
        print(updated_record)
else:
    print('Operation terminated!')

I tested the above script (some values are changed while posting on stackoverflow) in TEST environment (<1000 records) and everything works fine, but when I test them in PRODUCTION environment with 8.5 million records and 2 GB of data. The script scans 0 records.

enter image description here

Do I need to perform the scans differently and am I missing something? or its just the limitation of "scan" operation in dynamoDB?

Upvotes: 1

Views: 1973

Answers (1)

Seth Geoghegan
Seth Geoghegan

Reputation: 5747

Sounds like your issue is related to how DynamoDB filters data and paginates results. To review what is happening here, consider the order of operations when executing a DynamoDB scan/query operation while filtering. DynamoDB does the following in this order:

  1. Read items from the table
  2. Apply Filter
  3. Return Results

DynamoDB query and scan operations return up to 1MB of data at a time. Anything beyond that will be paginated. You know your results are being paginated if DynamoDB returns a LastEvaluatedKey element in your response.

Filters apply after the 1MB limit. This is the critical step that often catches people off-guard. In your situation, the following is happening:

You execute a scan operation that reads 1MB of data from the table. You apply a filter to the 1MB response, which results in all of the records in the first step being eliminated from the response. DDB returns the remaining items with a LastEvaluatedKey element, which indicates there is more data to search. In other words, your filter isn't applying to the entire table. It's applying to 1MB of the table at a time. In order to get the results you are looking for, you are going to need to execute the scan operation repeatedly until you reach the last "page" of the table.

Upvotes: 4

Related Questions