Somayeh Ghazvinian
Somayeh Ghazvinian

Reputation: 142

Azure DocumentDB query gets slow for more than 1000 calls

I am trying out the Azure DocumentDB and it all works fine, but when I compare it with Azure table storage it seems like reading documents gets kind of slow when I have more than 1000 documents.

Here is the snippet that I have:

public class DocumentDBProvider
{
    private static string EndpointUrl = "https://YourDocumentDbName.documents.azure.com:443/";
    private static string AuthorizationKey = "Take this code from your Azure Management Portal";
    private static string DatabaseName = "InterviewDB";
    private static string DocumentCollectionName = "InterviewCollection";

    public async Task<DocumentCollection> CreateDatabaseAndDocumentCollection()
    {
        var client = new DocumentClient(new Uri(EndpointUrl), AuthorizationKey);

        Database database = await client.CreateDatabaseAsync(new Database { Id = DatabaseName });

        DocumentCollection documentCollection = await client.CreateDocumentCollectionAsync(database.CollectionsLink,
                                                                                         new DocumentCollection { Id = DocumentCollectionName }
                                                                                           );
        return documentCollection;
    }

    public string GetDocumentLink()
    {
        var client = new DocumentClient(new Uri(EndpointUrl), AuthorizationKey);
        Database database = client.CreateDatabaseQuery().Where(db => db.Id == DatabaseName).AsEnumerable().FirstOrDefault();
        DocumentCollection documentCollection = client.CreateDocumentCollectionQuery(database.CollectionsLink).Where(db => db.Id == DocumentCollectionName).AsEnumerable().FirstOrDefault();
        return documentCollection.DocumentsLink;
    }
    public DocumentClient GetClient()
    {
        return new DocumentClient(new Uri(EndpointUrl), AuthorizationKey);
    }

    public List<Candidate> GetCandidateById(int candidateId)
    {
        var client = new DocumentClient(new Uri(EndpointUrl), AuthorizationKey);
        Database database = client.CreateDatabaseQuery().Where(db => db.Id == DatabaseName).AsEnumerable().FirstOrDefault();
        DocumentCollection documentCollection = client.CreateDocumentCollectionQuery(database.CollectionsLink).Where(db => db.Id == DocumentCollectionName).AsEnumerable().FirstOrDefault();

        return client.CreateDocumentQuery<Candidate>(documentCollection.DocumentsLink).Where(m => m.CandidateId == candidateId).Select(m => m).ToList();
    }

}

Any Ideas what could make it to load slow calling the GetCandidateById function when I call it 1000 times?

Upvotes: 0

Views: 624

Answers (2)

Andrew Liu
Andrew Liu

Reputation: 8119

As Aram mentioned, the code snippet included above doesn't cache the Collection self-link.... so the method makes 3 network requests (1 to retrieve the database, 1 to retrieve the collection, and 1 to retrieve the document).

Caching the self-link for the collection can reduce the method to a single network request, which in turn greatly improve the performance of the method.

Id-based routing

Since the code snippet above retrieves the database and collection by id, another improvement I'd suggest is to use id-based routing... this means you can avoid having to query for the collection to retrieve a self-link.

The following is an example of performing a document delete operation using self-links:

// Get a Database by querying for it by id
Database db = client.CreateDatabaseQuery()
                    .Where(d => d.Id == "SalesDb")
                    .AsEnumerable()
                    .Single();

// Use that Database's SelfLink to query for a DocumentCollection by id
DocumentCollection coll = client.CreateDocumentCollectionQuery(db.SelfLink)
                                .Where(c => c.Id == "Catalog")
                                .AsEnumerable()
                                .Single();

// Use that Collection's SelfLink to query for a DocumentCollection by id
Document doc = client.CreateDocumentQuery(coll.SelfLink)
                     .Where(d => d.Id == "prd123")
                     .AsEnumerable()
                     .Single();

// Now that we have a doc, use it's SelfLink property to delete it
await client.DeleteDocumentAsync(doc.SelfLink);

Here is the same document delete logic using id-based routing (w/ a manually built string):

// Build up a link manually using ids
// If you are building up links manually, ensure that 
// the link does not end with a trailing '/' character
var docLink = string.Format("dbs/{0}/colls/{1}/docs/{2}", 
     "SalesDb", "Catalog", "prd123");

// Use this constructed link to delete the document
await client.DeleteDocumentAsync(docLink);

The SDK also includes a URI factory, which can be used in place of manually building a string:

// Use UriFactory to build the DocumentLink
Uri docUri = UriFactory.CreateDocumentUri("SalesDb", "Catalog", "prd123");

// Use this constructed Uri to delete the document
await client.DeleteDocumentAsync(docUri);

Check out the following blog post for more details: https://azure.microsoft.com/en-us/blog/azure-documentdb-bids-fond-farewell-to-self-links/

Upvotes: 1

Aram
Aram

Reputation: 5705

If you call this function (GetCandidateById) for more than a 1000 times in a load test or a loop, I guess the performance issue that you might face is the creation of DocumentLink as well as the DocumentCollection.

When querying for documents in DocumentDB - you should cache the documentCollection.DocumentsLink value so that you don't have to query for the database and collection for each query.

This will reduce your query from 3 network round trips to 1.

Upvotes: 1

Related Questions