Reputation: 75
I have a ATS table whose Partition key and Row Key looks like:
PartitionKey RowKey
US_W|000000001 0000200325|0184921077191606273
US_W|000000004 0000200328|0184921077191606277
US_W|000000005 XXXXXXXXXX|XX(somenumbers)XXXX
To be clear, I only have the PartitionKey with me to query this table and RowKey is unknown.
I am retrieving the result from table using the following method:
public async Task<IList<T>> FetchSelectedDataByPartitionKey<T>(string partitionKey, List<string> columns, QueryComparisonEnums partitionKeyQueryCompareEnums = QueryComparisonEnums.Equal) where T : class, ITableEntity, new()
{
var tableClient = await GetTableClient<T>();
string query = $"PartitionKey {partitionKeyQueryCompareEnums.GetAttribute<EnmDecriptionAttribute>()?.Value} '{partitionKey}'";
AsyncPageable<T> queryResultsFilter = tableClient.QueryAsync<T>(filter: query, select: columns);
List<T> result = new List<T>();
await foreach (Page<T> page in queryResultsFilter.AsPages())
{
foreach (var qEntity in page.Values)
{
result.Add(qEntity);
}
}
return result;
}
This function works fine but it takes around 60 seconds to scan huge set of data from this table and filter and fetch 75000 entities from it. To get faster result set I am already using select property to only fetch selected fields of an entity instead of fetching entire entity. I read few blogs such as distributed scan of Azure Table Storage but I believe this holds good only if PartitionKey is more scattered.
How can I retrieve the data in a faster way? Any help is appreciated :)
Upvotes: 0
Views: 1576
Reputation: 590
In Azure Table Storage, Point Query which is combination of PartitionKey and RowKey works as clustered index and is the most efficient way for lookup. By keeping both together storage will immediately know which partition to query and perform lookup on Rowkey in that partition.
But as you have mentioned, Rowkey is unknown to you hence currently you are doing Partition Scan which uses partitionkey value and some other filters.
As per my understanding, you can make use of pagination and continuation token by setting the value MaxPerPage
in QueryAsync
method. Then passing the continuation token value to AsPages()
method and getting data per page with token for the next page.
Below is sample code which is similar to the code you used. Please look at the parameters maxPerPage and continuationToken passed to QueryAsync() and AsPages() method respectively: -
var customers = _tableClient.QueryAsync<CustomerModel>(filter: "", maxPerPage: 5);
await foreach (var page in customers.AsPages(continuationToken))
{
return Tuple.Create<string, IEnumerable<CustomerModel>>(page.ContinuationToken, page.Values);
}
References: -
Upvotes: 0