Reputation: 1138
I have a cosmos DB whose size is around 100GB. I successfully create a nice partition key, i have around 4600 partition on 70M records, but I still need to query on two datetime fields that are stored as a string, not in an epoch format.
Example json:
"someField1": "UNKNOWN",
"someField2": "DATA",
"endDate": 7014541201,
"startDate": 7054864502,
"someField3": "0",
"someField3": "0",
i notice when i do select * from tbl
and when i do select * from tbl where startDate > {someDate} AND endDate<{someDate1}
latency different is around 1s, so this filtering does not decrease my latency time.
Is it better to store date types as number? Does cosmos have better performance on epoch query ranges?
I am using SQL API.
Also when i try to add hash indexes on a startDate and endDate he basically convert that into two indexes. Example:
"path": "/startDate/?",
"indexes": [
{
"kind": "Hash",
"dataType": "String",
"precision": 3
}
]
},
this is converted to
"path": "/startDate/?",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Range",
"dataType": "String",
"precision": -1
}
]
Is that a normal behaviour or it is related to my data? Thanks.
I checked query metrics, and for 4k records query to cosmosDB is executed in 100ms. I would like to ask you is it normal behaviour that
var option = new FeedOptions { PartitionKey = new PartitionKey(partitionKey), MaxItemCount = -1};
var query= client.CreateDocumentQuery<MyModel>(collectionLink, option)
.Where(tl => tl.StartDate >= DateTimeToUnixTimestamp(startDate) && tl.EndDate <= DateTimeToUnixTimestamp(endDate))
.AsEnumerable().ToList();
this query returns 10k results (in Postman its around 9MB size) in 10-12s? This partition contains around 50k records.
Retrieved Document Count : 12,356
Retrieved Document Size : 12,963,709 bytes
Output Document Count : 3,633
Output Document Size : 3,819,608 bytes
Index Utilization : 29.00 %
Total Query Execution Time : 264.31 millisecondsQuery Compilation Time : 0.12 milliseconds
Logical Plan Build Time : 0.07 milliseconds
Physical Plan Build Time : 0.06 milliseconds
Query Optimization Time : 0.01 milliseconds
Index Lookup Time : 51.10 milliseconds
Document Load Time : 140.51 milliseconds
Runtime Execution Times
Query Engine Times : 55.61 milliseconds
System Function Execution Time : 0.00 milliseconds
User-defined Function Execution Time : 0.00 milliseconds
Document Write Time : 10.56 milliseconds
Client Side Metrics
Retry Count : 0
Request Charge : 904.73 RUs
Upvotes: 0
Views: 3716
Reputation: 1401
I am from the CosmosDB engineering team.
Since your collection has 70M records, I assume that the observed latency is only on the first roundtrip (or first page) of results. Note that the observed latency can also be improved by tweaking FeedOptions.MaxDegreeOfParallelism to -1 when executing the query.
Regarding the difference between the two queries themselves, please note that SELECT * without a filter is a full scan query, which is probably a bit faster to first return results, when compared to the other query with two filters, which does a little bit more work on the local indexes across all the partitions, which may explain the observed latency.
Regarding your other question, we no longer support the Hash indexing policy on new collections. Please see here: https://learn.microsoft.com/en-us/azure/cosmos-db/index-types#index-kind . We automatically convert Hash indexes to Range with full precision.
You may also fetch QueryMetrics for your query and analyze the results to figure out why you have latency. Details are here: https://learn.microsoft.com/en-us/azure/cosmos-db/sql-api-query-metrics#query-execution-metrics
Upvotes: 3