Reputation: 381
I are trying to scan and update all entry with specific attribute value in my Amazon DynamoDB table, this will be one time operations and the parameter I am querying is not an index.
If I understood right my only option is to perform a scan of whole Amazon DynamoDB table and whenever that entry is encountered, I should update them.
My table size is around 2 GB and my table has over 8.5 million records.
Below is snippet of my script:
scan_kwargs = {
'FilterExpression': Key('someKey').eq(sometargetNumber)
}
matched_records = my_table.scan(**scan_kwargs)
print 'Number of records impacted by this operations: ' + str(matched_records['Count'])
user_response = raw_input('Would you like to continue?\n')
if user_response == 'y':
for item in matched_records['Items']:
print '\nTarget Record:'
print(item)
updated_record = my_table.update_item(
Key={
'sessionId': item['attr0']
},
UpdateExpression="set att1=:t, att2=:s, att3=:p, att4=:k, att5=:si",
ExpressionAttributeValues={
':t': sourceResponse['Items'][0]['att1'],
':s': sourceResponse['Items'][0]['att2'],
':p': sourceResponse['Items'][0]['att3'],
':k': sourceResponse['Items'][0]['att4'],
':si': sourceResponse['Items'][0]['att5']
},
ReturnValues="UPDATED_NEW"
)
print '\nUpdated Target Record:'
print(updated_record)
else:
print('Operation terminated!')
I tested the above script (some values are changed while posting on stackoverflow) in TEST environment (<1000 records) and everything works fine, but when I test them in PRODUCTION environment with 8.5 million records and 2 GB of data. The script scans 0 records.
Do I need to perform the scans differently and am I missing something? or its just the limitation of "scan" operation in dynamoDB?
Upvotes: 1
Views: 1973
Reputation: 5747
Sounds like your issue is related to how DynamoDB filters data and paginates results. To review what is happening here, consider the order of operations when executing a DynamoDB scan/query operation while filtering. DynamoDB does the following in this order:
DynamoDB query
and scan
operations return up to 1MB of data at a time. Anything beyond that will be paginated. You know your results are being paginated if DynamoDB returns a LastEvaluatedKey element in your response.
Filters apply after the 1MB limit. This is the critical step that often catches people off-guard. In your situation, the following is happening:
You execute a scan operation that reads 1MB of data from the table. You apply a filter to the 1MB response, which results in all of the records in the first step being eliminated from the response. DDB returns the remaining items with a LastEvaluatedKey element, which indicates there is more data to search. In other words, your filter isn't applying to the entire table. It's applying to 1MB of the table at a time. In order to get the results you are looking for, you are going to need to execute the scan operation repeatedly until you reach the last "page" of the table.
Upvotes: 4