Reputation: 11
I'm trying to find the best practice for doing hundreds of parallel dynamodb queries for one single request. I am currently using python, but I'm open to any languages and frameworks that works best for this use case. Here is basically what I want to do, I shortened it to only 4 values here but in the end I would like it to query 500 at once.
import boto3
import time
from boto3.dynamodb.conditions import Key
variables = {'random1':None,'random2':None,'random3':None,'random500':None}
table = boto3.resource('dynamodb','eu-west-1').Table('sometable')
for v in variables:
variables[v]=table.query(KeyConditionExpression=Key('k').eq(v),Select='COUNT')['Count']
print(variables)
# expected output: {'random1': 12, 'random2': 30, 'random3': 230, 'random500': 5}
So I'm doing select count queries to get the distinct count for each key in the table. The output of this "function" is something I need to return in the service. For each of these queries, the response time is great, like 40ms. But obviously, running this sequentially will scale linearly which doesn't work as I would want to end up with a sub 150ms (maximum) for all of these 500 variables.
Has anyone done anything similar? Any advice would be greatly appreciated!
Upvotes: 1
Views: 1996
Reputation: 23823
My advice would be to not do this.
If you need aggregations in DDB, the the preferred approach would be to enable streams and have a Lamba update/write an aggregation entry in the existing table (or a new one).
Here's a good article... Real-Time Aggregation with DynamoDB Streams
Upvotes: 2