Justin Borromeo
Justin Borromeo

Reputation: 1371

Efficiently retrieving large numbers of entities from Azure Table Storage

What are some ways to optimize the retrieval of large numbers of entities (~250K) from a single partition from Azure Table Storage to a .NET application?

Upvotes: 0

Views: 2087

Answers (1)

Brando Zhang
Brando Zhang

Reputation: 28387

As far as I know, there are two ways to optimize the retrieval of large numbers of entities from a single partition from Azure Table Storage to a .NET application.

1.If you don’t need to get all properties of the entity, I suggest you could use server-side projection.

A single entity can have up to 255 properties and be up to 1 MB in size. When you query the table and retrieve entities, you may not need all the properties and can avoid transferring data unnecessarily (to help reduce latency and cost). You can use server-side projection to transfer just the properties you need.

From:Azure Storage Table Design Guide: Designing Scalable and Performant Tables(Server-side projection)

More details, you could refer to follow codes:

string filter = TableQuery.GenerateFilterCondition(
        "PartitionKey", QueryComparisons.Equal, "Sales");
List<string> columns = new List<string>() { "Email" };
TableQuery<EmployeeEntity> employeeQuery =
        new TableQuery<EmployeeEntity>().Where(filter).Select(columns);

var entities = employeeTable.ExecuteQuery(employeeQuery);
foreach (var e in entities)
{
        Console.WriteLine("RowKey: {0}, EmployeeEmail: {1}", e.RowKey, e.Email);
}

2.If you just want to show the table’s message, you needn’t to get all the entities at same time. You could get part of the result. If you want to get the other result, you could use the continuation token. This will improve the table query performance.

A query against the table service may return a maximum of 1,000 entities at one time and may execute for a maximum of five seconds. If the result set contains more than 1,000 entities, if the query did not complete within five seconds, or if the query crosses the partition boundary, the Table service returns a continuation token to enable the client application to request the next set of entities. For more information about how continuation tokens work, see Query Timeout and Pagination.

From:Azure Storage Table Design Guide: Designing Scalable and Performant Tables(Retrieving large numbers of entities from a query)

By using continuation tokens explicitly, you can control when your application retrieves the next segment of data.

More details, you could refer to follow codes:

string filter = TableQuery.GenerateFilterCondition(
        "PartitionKey", QueryComparisons.Equal, "Sales");
TableQuery<EmployeeEntity> employeeQuery =
        new TableQuery<EmployeeEntity>().Where(filter);

TableContinuationToken continuationToken = null;

do
{
        var employees = employeeTable.ExecuteQuerySegmented(
        employeeQuery, continuationToken);
    foreach (var emp in employees)
    {
    ...
    }
    continuationToken = employees.ContinuationToken;
} while (continuationToken != null);

Besides, I suggest you could pay attention to the table partition scalability targets.

Target throughput for single table partition (1 KB entities) Up to 2000 entities per second

If you reach the scalability targets for this partition, the storage service will throttle.

Upvotes: 3

Related Questions