Person1
Person1

Reputation: 149

Dynamodb bulk query

I have an index used for bulk operations on collections that is experiencing throttling. To mitigate this am planning to shard the index so each pk is split over whatever number of partitions. At the moment there is a delete operation running on the base table using the index, so what happens is we query a set number of items against a pk in the index, delete them, then repeat until finished.

The problem I see here is that if I do something similar with the sharded partition keys now I will just end up iterating through each partition and get the same issue with throttling on the base table when deleting. I was wondering if there is a way to issue a bulk query in dynamo so I can for example checks all shards and retrieve a set with an even distribution of items across them?

Upvotes: 0

Views: 303

Answers (2)

Leeroy Hannigan
Leeroy Hannigan

Reputation: 19713

Its important to understand the cause and magnitude of your GSI throttling. Is it write or read throttling your are experiencing? Is your GSI partition key of low cardinality?

Assuming writes is the issue, you only need to shard the GSI keys which are consuming more than 1000 WCU per second. So imagine your expected throughput requires 4000 WCU per second, then you will need to only shard 4-5 times. You can then use PartiQL API to run a "batch query" to retrieve all the items in a single call:

SELECT * FROM "mytable"."index" WHERE GSIPK IN ["a-1","a-2","a-3","a-4"]

This article contains more info on sharding Item Collections on DynamoDB:

https://medium.com/@leeroy.hannigan/optimizing-dynamodb-queries-using-key-sharding-f3eb4d7f78f7

Upvotes: 1

Borislav Stoilov
Borislav Stoilov

Reputation: 3677

Are you talking about global secondary indexes? If yes they have their own capacity and splitting the index into multiple indexes will have a positive impact for sure.

That thing aside, are you able to use TTL instead of querying and deleting items? TTL is free, done in the background and will cause no throttling what so ever.

From the docs

TTL is useful if you store items that lose relevance after a specific time. The following are example TTL use cases:

Remove user or sensor data after one year of inactivity in an application.

Archive expired items to an Amazon S3 data lake via Amazon DynamoDB Streams and AWS Lambda.

Retain sensitive data for a certain amount of time according to contractual or regulatory obligations.

Upvotes: 0

Related Questions