Reputation: 2118
I currently have documents modelling emails that are a bit like the following
{
"AccountId": "AccountId",
"Brand": "MyBrand",
"Product": "MyProduct",
"Metadata": {
"Campaign": "EmailCampaign1",
"Metadata2": "Some other info",
},
"Status": {
"State": "delivered",
"DeliveryEvents": [
{
"Event": "delivered",
"DateTimeOccured": "2019-03-14T12:25:12Z",
},
{
"Event": "processed",
"DateTimeOccured": "2019-03-14T12:25:09Z"
}
]
},
"id": "AnId",
"CreatedAt": 1552566306,
"Stats": {
"DeliveryStats": {
"processed": true,
"deferred": false,
"delivered": true,
"dropped": false,
"bounce": false
}
}
}
For reference, the AccountId
is currently the Partition Key.
And I wanted to do a COUNT
on the DeliveryStats
where you could filter on one or more of the following:
AccountId
Brand
Metadata
(search for key value pair)CreatedAt
(between two dates for example).Here's an example query that I currently have for getting the count of processed items with some filters. Ideally I'd like to get the count of all the different DeliveryStats
but this doesn't seem to be possible right now.
SELECT VALUE COUNT(1) FROM c WHERE c.Stats.DeliveryStats.processed = true AND c.Brand = 'MyBrand' AND c.Metadata.Campaign = 'EmailCampaign1'
Everything being queried on is indexed.
Now this is pretty fast on smaller data sets, as you'd expect, but as soon as your start getting into the millions it seems to be loading each and every document (or I'm really reading the query metrics wrong).
My question is, is this query written correctly? Is there anything more I can do to speed this kind of query up?
Open to restructuring data or storing supplementary data.
Upvotes: 0
Views: 556
Reputation: 5004
The query seems ok in principle, assuming predicate values are indexed and selective enough. Without seeing the data, query metrics and index definition its hard to pinpoint to anything with confidence, but ..
Are your indexed data selective enough individually? CosmosDB indexes store values of single property and hence even if a combination of 3 indexes may be selective enough, CosmosDB most likely would have to pick just one to be the main index to be scanned. If the indexes are individually not selective enough then that could give you poor performance even if the combination was selective enough.
If that's the case, you could consider to merge the values pf individually not selective enough columns values into a single hash-indexed property for lookup. Ex:
{
"AccountId": "AccountId",
"Brand": "MyBrand",
"Metadata": {
"Campaign": "EmailCampaign1",
},
...
"MergedForLookup": "MyBrand_EmailCampaign_processed"
...
}
Obviously supporting N optional filters with any possible combination of what's given is tricky, but you get the idea: trade storage for speed.
If you don't find a good denormalization helpers and absolutely need to search by many combinations, then you may have to include an external index for searching. For example, check out how to add Azure Search.
Next candidate is to do with the fact that you are doing a cross-partition query. Basically a query on every partition = N queries. If your data grows to millions then most likely it has many partitions (= accountIds) + your data would be split up to multiple physical partitions internally and that definitely will have an impact. If possible, you should check if including filter on AccountId
would alleviate the case. If possible, make 'AccountId' filter mandatory.
If the query is slower than you expect even with single-partition query, then there has been reports how count()
queries are not as effective as one may expect. Check the following issue and give your votes:
"Make COUNT() aware of indexes" on CosmosDB Feedback.
Yes, AFAIK that's not yet supported in single query, but in the works. Check the request "Add Group By support for Aggregate Functions".
Right now, if there are relatively few fixed values then just do those 5 or whatever queries per each stat in succession. If each of them is properly using partition/index then it should still be lightning-fast (5X lightning fast, to be precise:P).
Upvotes: 1