Reputation: 2161
I am testing CosmosDb. I am finding the initial connection usually takes many seconds.I have written a small .net core 2.2 Console App to demonstrate the problem.
static async System.Threading.Tasks.Task Main(string[] args)
{
Console.WriteLine("Hello World!");
string url = "https://docdb.documents.azure.com:443/";
string key = "myKey";
DocumentClient docClient = new DocumentClient(new Uri(url), key);
Stopwatch sw = new Stopwatch();
while (true)
{
sw.Start();
var res = await docClient.ReadDocumentAsync(UriFactory.CreateDocumentUri("ct", "ops", "xxx"),
new RequestOptions { PartitionKey = new Microsoft.Azure.Documents.PartitionKey("test") });
sw.Stop();
Console.WriteLine($"Query took {sw.ElapsedMilliseconds}ms");
sw.Reset();
await Task.Delay(1000);
}
}
Here are my typical results:
Hello World!
Query took 48530ms
Query took 36ms
Query took 26ms
Query took 15ms
The first request takes 48 secs! i tried ammending the docClient construction to:
DocumentClient docClient = new DocumentClient(new Uri(url), key,new
ConnectionPolicy {ConnectionMode = ConnectionMode.Direct,ConnectionProtocol
= Protocol.Tcp });
To see if it was better some typical results:
Hello World!
Query took 20536ms
Query took 104ms
Query took 37ms
Query took 71ms
Query took 13ms
Query took 88ms
Query took 14ms
Still 20 secs on first query.
My database is
130Mb
Avg Throughput /s* 3.58 RU/s
and I have
400 RU's
is there anything I can do to alleviate the delay in the first connection?
My Document on Cosmosdb is :
{
"id": "xxx",
"fid": "test",
"_rid": "VX8TAPKGDqNeWwEAAAAAAA==",
"_self": "dbs/VX8TAA==/colls/VX8TAPKGDqM=/docs/VX8TAPKGDqNeWwEAAAAAAA==/",
"_etag": "\"0000d4a2-0000-1100-0000-5ce801ef0000\"",
"_attachments": "attachments/",
"_ts": 1558708719
}
Upvotes: 4
Views: 2015
Reputation: 15603
This is expected, see point #2 in https://learn.microsoft.com/en-us/azure/cosmos-db/performance-tips#networking
By default, the first request has a higher latency because it has to fetch the address routing table. To avoid this startup latency on the first request, you should call OpenAsync() once during initialization as follows.
static async System.Threading.Tasks.Task Main(string[] args)
{
Console.WriteLine("Hello World!");
string url = "https://docdb.documents.azure.com:443/";
string key = "myKey";
DocumentClient docClient = new DocumentClient(new Uri(url), key);
await docClient.OpenAsync();
Stopwatch sw = new Stopwatch();
while (true)
{
sw.Start();
var res = await docClient.ReadDocumentAsync(UriFactory.CreateDocumentUri("ct", "ops", "xxx"),
new RequestOptions { PartitionKey = new Microsoft.Azure.Documents.PartitionKey("test") });
sw.Stop();
Console.WriteLine($"Query took {sw.ElapsedMilliseconds}ms");
sw.Reset();
await Task.Delay(1000);
}
}
Upvotes: 6
Reputation:
This is could have three infrastructure causes:
Credential caching; Upon receiving the initial connection, the server has to connect to the authentication source on the authentication servers and wait for a reply. Once the reply is received it is cached for subsequent connections.
Virtual machine spin-up; The first request starts a new instance of a VM and processes the request once complete.
Database collection; The related records and indices are collected from the back end databases and cached on the allocated server. This would become more apparent if multiple tables and indices are used in the initial query. The results are sent when the server has enough information to fulfill the request.
Subsequent queries will already have the information cached and will therefore be much faster.
The solution is to do an asynchronous null query(a small request that does nothing) first, no need to wait for the result, to prime the system for the actual forthcoming queries.
Depending on the application, you may want to have keep-alive requests sent during long periods of inactivity. This may get expensive based on the fee structure.
There are a lot of factors in play(most of which you have no control over): size and amount of the data in the partition, key organization and indices, sever loads, network congestion, data fragmentation, caching policies, storage tiers(hot - immediate access and fast transfer(SSD or memory), warm - immediate access and slow transfer(HDD), cold - delayed access and slow transfer(powered down HDD)).
This is the main drawback from cloud based technologies in general. You trade the risk of minor delays and control for the benefits of availability and resiliency.
Upvotes: 2