Ken Smith
Ken Smith

Reputation: 20445

Slow performance for Azure Table Storage queries

I'm running a series of well-structured queries against Azure Table Storage that should, so far as I can tell, be returning sub-second. Indeed, if I run them manually (say, from the Azure tools inside Visual Studio), they are indeed returning immediately. But when I'm running them in production, they are sometimes taking upwards of 20-30 seconds to return.

Here's the C# code that I'm calling ATS with:

public async Task<IList<T>> FindAsync(string filter, int maxRecords = int.MaxValue, IList<string> columns = null)
{
    var returnList = new List<T>();
    try
    {
        Interlocked.Increment(ref _outstandingRequests);
        var query = new TableQuery<T>().Where(filter);
        if (columns != null && columns.Any())
        {
            query = query.Select(columns);
        }
        TableQuerySegment<T> querySegment = null;
        var sw = new Stopwatch();
        sw.Start();
        while (returnList.Count < maxRecords && (querySegment == null || querySegment.ContinuationToken != null))
        {
            try
            {
                await 3.RetriesAsync(async x =>
                {
                    querySegment = await
                        Table.ExecuteQuerySegmentedAsync(query,
                            querySegment != null ? querySegment.ContinuationToken : null);
                });
                returnList.AddRange(querySegment);
            }
            catch (Exception ex)
            {
                _logger.Error("Error executing ATS query; table:{0}; filter:{1}; error:{2}",
                    typeof(T).GetFriendlyTypeName(), filter, ex.CompleteMessage());
                throw;
            }
        }
        sw.Stop();
        if (sw.ElapsedMilliseconds > 10000)
        {
            var stat = new RepoOperationStats(filter, sw, returnList.Count, _outstandingRequests);
            _logger.Warn("Long-running {0} query: secs:{1:0.0}, rc:{2}, or:{3}, fi:{4}",
                typeof(T).GetFriendlyTypeName(), stat.Milliseconds / 1000d, stat.ResultCount, stat.OutstandingRequests, stat.Filter);
        }
    }
    finally
    {
        Interlocked.Decrement(ref _outstandingRequests);
    }
    return returnList;
}

And here's an example of an entity stored in the table:

enter image description here

All fairly straightforward. But in my logs, I'm seeing repeated errors like this:

Long-running AtsOrganizationEventSummaryByCookie query: secs:33.3, rc:0, or:94, fi:(PartitionKey eq '4306.www-detail-mercury-mars-skywatching-tips.html-get') and ((RowKey ge '2015.02.05.00000000-0000-0000-0000-000000000000') and (RowKey le '2015.02.07.00000000-0000-0000-0000-000000000000'))

In other words, it's taking 33 seconds to return zero rows. Note that it's hitting exactly one partition, and it should be able to do a simple seek on the row index within that partition. (And indeed, the same query returns immediately in other contexts.)

Is there some sort of throttling mechanism that I'm running into? I should note that I am calling these queries in parallel, so that at any given point in time, anywhere from a dozen to upwards of 100 queries could be outstanding. But it seems like both (a) my client, and (b) ATS should be able to handle that level of load.

Any suggestions for how to troubleshoot this?

Upvotes: 6

Views: 2331

Answers (1)

TDao
TDao

Reputation: 584

What is your filter? From the log you have, looks like you are doing a scan operation with

(RowKey ge '2015.02.05.00000000-0000-0000-0000-000000000000') and (RowKey le '2015.02.07.00000000-0000-0000-0000-000000000000')

Although RowKey is indexed, when you filter by a range of rowkeys with comparison like ge or le, it performs a scan instead, which can be very slow depending on the size of your table.

You can try loading the whole partition 4306.www-detail-mercury-mars-skywatching-tips.html-get in memory and do the filtering to see if it's faster.

Btw, from your entity data structure, looks like you are trying log event for accessing a web page. If you are, you may want to check out Application Insights. It's more suitable for telemetry logging.

Upvotes: 0

Related Questions