Reputation: 359
We have a solution utilizing a micro-service approach. One of our micro-service is responsible for pushing data to Cosmos. Our Cosmos database is using serverless provision having a 5,000 RU/s limit.
The data we are inserting into Cosmos looks like the below. There are 10 columns and we are pushing a batch containing 5,807 rows of this data.
Id | CompKey | Primary Id | Secondary Id | Type | DateTime | Item | Volume | Price | Fee |
---|---|---|---|---|---|---|---|---|---|
1 | Veg_Buy | csd2354csd | dfg564dsfg55 | Buy | 30/08/21 | Leek | 10 | 0.75 | 5.00 |
2 | Veg_Buy | sdf15s1dfd | sdf31sdf654v | Buy | 30/08/21 | Corn | 5 | 0.48 | 3.00 |
We are retrieving data from multiple sources, normalizing it, and sending out the data as one bulk execution to Cosmos. The retrieval process happens every hour. We understand that we are spiking the Cosmos database once per hour with the data that has been retrieved and then stop sending data until the next retrieval cycle. So if this high peak is the problem, what remedies exist for such a scenario?
Can anyone shed some light on what we should/need to do to overcome this issue? Perhaps we are missing a setting when creating the Cosmos database or possibly this has something to do with partitioning?
Upvotes: 4
Views: 2113
Reputation: 8783
You can mostly determine these things by looking at the metrics published in the Azure Portal. This doc is a good place to start, Monitor and debug with insights in Azure Cosmos DB.
In particular I would look at the section titled, Determine the throughput consumption by a partition key range
If you are not dealing with a hot partition key you may want to look at options to throttle your writes. This may include modifying your batch size and putting the write operations on a while..loop with a one second timer until RU/s consumed equals 5000 RU/s. You could also possibly look at doing queue-based load leveling and put writes on a queue in front of Cosmos and stream them in.
Upvotes: 3