How to do fast(er) aggregations on large numbers of records in Cosmos DB?

Question

I currently have documents modelling emails that are a bit like the following

{
    "AccountId": "AccountId",
    "Brand": "MyBrand",
    "Product": "MyProduct",
    "Metadata": {
        "Campaign": "EmailCampaign1",
        "Metadata2": "Some other info",
    },
    "Status": {
        "State": "delivered",
        "DeliveryEvents": [
            {
                "Event": "delivered",
                "DateTimeOccured": "2019-03-14T12:25:12Z",
            },
            {
                "Event": "processed",
                "DateTimeOccured": "2019-03-14T12:25:09Z"
            }
        ]
    },
    "id": "AnId",
    "CreatedAt": 1552566306,
    "Stats": {
        "DeliveryStats": {
            "processed": true,
            "deferred": false,
            "delivered": true,
            "dropped": false,
            "bounce": false
        }
    }
}

For reference, the AccountId is currently the Partition Key.

And I wanted to do a COUNT on the DeliveryStats where you could filter on one or more of the following:

AccountId
Brand
Metadata (search for key value pair)
CreatedAt (between two dates for example).

Here's an example query that I currently have for getting the count of processed items with some filters. Ideally I'd like to get the count of all the different DeliveryStats but this doesn't seem to be possible right now.

SELECT VALUE COUNT(1) FROM c WHERE c.Stats.DeliveryStats.processed = true AND c.Brand = 'MyBrand' AND c.Metadata.Campaign = 'EmailCampaign1'

Everything being queried on is indexed.

Now this is pretty fast on smaller data sets, as you'd expect, but as soon as your start getting into the millions it seems to be loading each and every document (or I'm really reading the query metrics wrong).

My question is, is this query written correctly? Is there anything more I can do to speed this kind of query up?

Open to restructuring data or storing supplementary data.

How to do fast(er) aggregations on large numbers of records in Cosmos DB?

Answers (1)

Individual index selectivity

Cross partition query

Is count() using index effectively

.. count of all the different DeliveryStats.

Related Questions