Reputation: 2585
I'm creating a logging system to monitor our (200 ish) main application installations, and Cosmos db seems like a good fit due to the amount of data we'll collect, and to allow a varying schema for the log data (particularly the Tags
array - see document schema below).
But, never having used CosmosDb before I'm slightly unsure of what to use for my partition key.
If I partitioned by CustomerId
, there would likely be several Gb
of data in each of the 200 partitions, and the data will usually be queried by CustomerId
, so this was my first choice for the partition key.
However I was planning to have a 'log stream' view in the logging system, showing logs coming in for all customers.
If so, is there an obvious way to avoid / limit the cost & speed implications of this cross partition querying? (Other than just taking out the log stream view for all customers!)
{
"CustomerId": "be806507-7cc4-4db4-881b",
"CustomerName": "Our Customer",
"SystemArea": 1,
"SystemAreaName": "ExchangeSync",
"Message": "Updated OK",
"Details": "",
"LogLevel": 2,
"Timestamp": "2018-11-23T10:59:29.7548888+00:00",
"Tags": {
"appointmentId": "109654",
"appointmentGroupId": "86675",
"exchangeId": "AAMkA",
"exchangeAlias": "[email protected]"
}
}
(Note - There isn't a defined list of SystemArea
types we'll use yet, but it would be a lot fewer than the 200 customers)
Upvotes: 0
Views: 3530
Reputation: 7200
Cross partition queries should be avoided as much as possible. If your querying is likely to happen with customer id then the customerid is a good logical partition key. However you have to keep in mind that there is a limit of 10GB per logical partition data.
A cross partition query across the whole database will lead to a very slow and very expensive operation but if it's not functionality critical and it's just used for infrequent reporting, it's not too much of a problem.
Upvotes: 1