JOHN SMITHTY
JOHN SMITHTY

Reputation: 23

Best way to do a large amount of Azure Table Queries?

I have an Azure table that has over a million entries and I am trying to do about 300,000 queries programmatically in C# in order to transfer some data to another system. Currently I am doing the following as I read through a file which has the partition and row keys:

while (!reader.EndOfStream)
{
    // parse the reader to get partition and row keys
    string currentQuery = TableQuery.CombineFilters(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partKey), TableOperators.And, TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.Equal, rowKey));
    TableQuery<MyEntity> query = new TableQuery<MyEntity>().Where(currentQuery);

    foreach (MyEntity entity in table.ExecuteQuery(query))
    {
        Console.WriteLine(entity.PartitionKey + ", " + entity.RowKey + ", " + entity.Timestamp.DateTime);
    }

    Thread.Sleep(25);
}

This is taking a very long time to complete(5+ hours). The queries are taking on average around 200 milliseconds from what I can see. I am kinda new to Azure so I figure I am doing something wrong. How can I improve it?

Upvotes: 0

Views: 859

Answers (1)

David Makogon
David Makogon

Reputation: 71031

A few things:

  1. Not sure why you have a sleep call in your loop. Unless you're being throttled (storage supports 20,000 transactions per second), you shouldn't need that.
  2. With a given partition key and row key, you'll get exactly one returned entity (as the combination pk+rk is unique). No need to loop through your results. You'll either get zero or one.
  3. You're taking a single-threaded approach, so it's highly unlikely you'll be able to push storage transaction rates very hard. Consider parallelizing your retrievals.
  4. I'm assuming you're not calling Console.Writeline() in your actual app. If so, this will slow you down as well.
  5. Consider disabling Nagle's algorithm, via ServicePointManager.UseNagleAlgorithm = false;. Otherwise, individual low-level calls to storage might be buffered up to 500ms, to more densely pack the tcp packets. This will be important if you're spending cycles processing the content you read.

Upvotes: 2

Related Questions