Stilgar
Stilgar

Reputation: 23591

In Azure Cosmos DB do I need to add the partition key to my query where clause?

I have a collection in Azure Cosmos DB with iot messages (called DeviceEvents). The partition key is application id. I want to do a query by device id (each device belongs to exactly one application). So I have a query like this

SELECT VALUE root 
FROM root 
WHERE root["ApplicationId"] = 69 AND root["DeviceId"] = 2978 
     AND root["TimeStamp"] >= "2021-01-30T20:30:05.1635579Z" 
     AND root["TimeStamp"] <= "2021-02-19T20:30:05.1635969Z" 
ORDER BY root["TimeStamp"] DESC OFFSET 0 LIMIT 30

When I execute the query like this I get Request Charge 10.96 RUs, Index lookup time 2.22 ms, Document load time 0.41 ms and Query engine execution time 0.24 ms

When I execute the query without the partition key

SELECT VALUE root 
FROM root 
WHERE root["DeviceId"] = 2978 
     AND root["TimeStamp"] >= "2021-01-30T20:30:05.1635579Z" 
     AND root["TimeStamp"] <= "2021-02-19T20:30:05.1635969Z" 
ORDER BY root["TimeStamp"] DESC OFFSET 0 LIMIT 30

When I execute the query like this I get Request Charge 10.45 RUs, Index lookup time 1.91 ms, Document load time 0.5 ms and Query engine execution time 0.24 ms

While the numbers vary the query with the partition key consistently consumes more RU and has higher index lookup time.

I don't have enough data for Cosmos DB to create different physical partitions right now but I will probably need it in the future. My relevant indexing policy is this

"compositeIndexes": [
        [
            {
                "path": "/DeviceId",
                "order": "ascending"
            },
            {
                "path": "/TimeStamp",
                "order": "descending"
            }
        ] 

So my questions are

  1. Do I need the partition key in the query?
  2. Do I need the partition key in the index definition?

Upvotes: 0

Views: 766

Answers (1)

Mark Brown
Mark Brown

Reputation: 8793

The reason you're getting confusing query stats is because the amount data is too small to provide meaningful results.

With a small amount of data (approx 20GB or less) you'll only be on a single physical partition. Cross-partition queries run just as fast as partitioned queries when on the same physical partition.

Where things start to blow up is when the database grows (scales). If you design your database to have a high number of cross-partition queries your database, by design, will not scale. So you definitely need (or should try as much as possible) to use the partition key in your queries, especially high volume queries.

I would also add TimeStamp in both an ascending and descending composite index.

The other thing you mentioned is every device belongs to the same applicationId. If that is the case then your container cannot grow larger than 20GB. If every device in this app has applicationId of 69 then you should redesign this container and find a new partition key. If your queries are always by device Id then that would make a much better partition key.

Upvotes: 2

Related Questions