DraganB
DraganB

Reputation: 1138

CosmosDB query on date range + index

I have a cosmos DB whose size is around 100GB. I successfully create a nice partition key, i have around 4600 partition on 70M records, but I still need to query on two datetime fields that are stored as a string, not in an epoch format.

Example json:

"someField1": "UNKNOWN",
"someField2": "DATA",
"endDate": 7014541201,
"startDate": 7054864502,
"someField3": "0",
"someField3": "0",

i notice when i do select * from tbl and when i do select * from tbl where startDate > {someDate} AND endDate<{someDate1} latency different is around 1s, so this filtering does not decrease my latency time. Is it better to store date types as number? Does cosmos have better performance on epoch query ranges? I am using SQL API.

Also when i try to add hash indexes on a startDate and endDate he basically convert that into two indexes. Example:

 "path": "/startDate/?",
        "indexes": [
            {
                "kind": "Hash",
                "dataType": "String",
                "precision": 3
            }
            ]
    }, 

this is converted to

"path": "/startDate/?",
        "indexes": [
            {
                "kind": "Range",
                "dataType": "Number",
                "precision": -1
            },
            {
                "kind": "Range",
                "dataType": "String",
                "precision": -1
            }
        ]

Is that a normal behaviour or it is related to my data? Thanks.

I checked query metrics, and for 4k records query to cosmosDB is executed in 100ms. I would like to ask you is it normal behaviour that

var option = new FeedOptions { PartitionKey = new PartitionKey(partitionKey),  MaxItemCount = -1};

var query= client.CreateDocumentQuery<MyModel>(collectionLink, option)
            .Where(tl => tl.StartDate >= DateTimeToUnixTimestamp(startDate) && tl.EndDate <= DateTimeToUnixTimestamp(endDate))
        .AsEnumerable().ToList();

this query returns 10k results (in Postman its around 9MB size) in 10-12s? This partition contains around 50k records.

Retrieved Document Count : 12,356
Retrieved Document Size : 12,963,709 bytes
Output Document Count : 3,633
Output Document Size : 3,819,608 bytes
Index Utilization : 29.00 %
Total Query Execution Time : 264.31 milliseconds

Query Compilation Time : 0.12 milliseconds

Logical Plan Build Time : 0.07 milliseconds

Physical Plan Build Time : 0.06 milliseconds

Query Optimization Time : 0.01 milliseconds

Index Lookup Time : 51.10 milliseconds

Document Load Time : 140.51 milliseconds

Runtime Execution Times

Query Engine Times : 55.61 milliseconds

System Function Execution Time : 0.00 milliseconds

User-defined Function Execution Time : 0.00 milliseconds

Document Write Time : 10.56 milliseconds

Client Side Metrics

Retry Count : 0

        Request Charge                         :          904.73 RUs

Upvotes: 0

Views: 3716

Answers (1)

Krishnan Sundaram
Krishnan Sundaram

Reputation: 1401

I am from the CosmosDB engineering team.

Since your collection has 70M records, I assume that the observed latency is only on the first roundtrip (or first page) of results. Note that the observed latency can also be improved by tweaking FeedOptions.MaxDegreeOfParallelism to -1 when executing the query.

Regarding the difference between the two queries themselves, please note that SELECT * without a filter is a full scan query, which is probably a bit faster to first return results, when compared to the other query with two filters, which does a little bit more work on the local indexes across all the partitions, which may explain the observed latency.

Regarding your other question, we no longer support the Hash indexing policy on new collections. Please see here: https://learn.microsoft.com/en-us/azure/cosmos-db/index-types#index-kind . We automatically convert Hash indexes to Range with full precision.

You may also fetch QueryMetrics for your query and analyze the results to figure out why you have latency. Details are here: https://learn.microsoft.com/en-us/azure/cosmos-db/sql-api-query-metrics#query-execution-metrics

Upvotes: 3

Related Questions