Reputation: 627
In Azure table storage, how can I query for a set of entities that match specific row keys in a partition???
I'm using Azure table storage and need to retrieve a set of entities that match a set of row keys within the partition.
Basically if this were SQL it may look something like this:
SELECT TOP 1 SomeKey
FROM TableName WHERE SomeKey IN (1, 2, 3, 4, 5);
I figured to save on costs and reduce doing a bunch of table retrieve operations that I could just do it using a table batch operation. For some reason I'm getting an exception that says:
"A batch transaction with a retrieve operation cannot contain any other operations"
Here is my code:
public async Task<IList<GalleryPhoto>> GetDomainEntitiesAsync(int someId, IList<Guid> entityIds)
{
try
{
var client = _storageAccount.CreateCloudTableClient();
var table = client.GetTableReference("SomeTable");
var batchOperation = new TableBatchOperation();
var counter = 0;
var myDomainEntities = new List<MyDomainEntity>();
foreach (var id in entityIds)
{
if (counter < 100)
{
batchOperation.Add(TableOperation.Retrieve<MyDomainEntityTableEntity>(someId.ToString(CultureInfo.InvariantCulture), id.ToString()));
++counter;
}
else
{
var batchResults = await table.ExecuteBatchAsync(batchOperation);
var batchResultEntities = batchResults.Select(o => ((MyDomainEntityTableEntity)o.Result).ToMyDomainEntity()).ToList();
myDomainEntities .AddRange(batchResultEntities );
batchOperation.Clear();
counter = 0;
}
}
return myDomainEntities;
}
catch (Exception ex)
{
_logger.Error(ex);
throw;
}
}
How can I achieve what I'm after without manually looping through the set of row keys and doing an individual Retrieve table operation for each one? I don't want to incur the cost associated with doing this since I could have hundreds of row keys that I want to filter on.
Upvotes: 6
Views: 5052
Reputation: 1339
I saw your post when I was looking for a solution, in my case I needed to be look up multiple ids at the same time.
Because there is no contains linq support (https://learn.microsoft.com/en-us/rest/api/storageservices/query-operators-supported-for-the-table-service) I just made a massive or equals chain.
Seems to be working for me so far hope it helps anyone.
public async Task<ResponseModel<ICollection<TAppModel>>> ExecuteAsync(
ICollection<Guid> ids,
CancellationToken cancellationToken = default
)
{
if (!ids.Any())
throw new ArgumentOutOfRangeException();
// https://learn.microsoft.com/en-us/rest/api/storageservices/query-operators-supported-for-the-table-service
// Contains not support so make a massive or equals statement...lol
var item = Expression.Parameter(typeof(TTableModel), typeof(TTableModel).FullName);
var expressions = ids
.Select(
id => Expression.Equal(
Expression.Constant(id.ToString()),
Expression.MakeMemberAccess(
Expression.Parameter(typeof(TTableModel), nameof(ITableEntity.RowKey)),
typeof(TTableModel).GetProperty(nameof(ITableEntity.RowKey))
)
)
)
.ToList();
var builderExpression = expressions.First();
builderExpression = expressions
.Skip(1)
.Aggregate(
builderExpression,
Expression.Or
);
var finalExpression = Expression.Lambda<Func<TTableModel, bool>>(builderExpression, item);
var result = await _azureTableService.FindAsync(
finalExpression,
cancellationToken
);
return new(
result.Data?.Select(_ => _mapper.Map<TAppModel>(_)).ToList(),
result.Succeeded,
result.User,
result.Messages.ToArray()
);
}
public async Task<ResponseModel<ICollection<TTableEntity>>> FindAsync(
Expression<Func<TTableEntity,bool>> filter,
CancellationToken ct = default
)
{
try
{
var queryResultsFilter = _tableClient.QueryAsync<TTableEntity>(
FilterExpressionTree(filter),
cancellationToken: ct
);
var items = new List<TTableEntity>();
await foreach (TTableEntity qEntity in queryResultsFilter)
items.Add(qEntity);
return new ResponseModel<ICollection<TTableEntity>>(items);
}
catch (Exception exception)
{
_logger.Error(
nameof(FindAsync),
exception,
exception.Message
);
// OBSFUCATE
// TODO PASS ERROR ID
throw new Exception();
}
}
Upvotes: 1
Reputation: 2085
I made a helper method to do it in a single request per partition.
Use it like this:
var items = table.RetrieveMany<MyDomainEntity>(partitionKey, nameof(TableEntity.RowKey),
rowKeysList, columnsToSelect);
Here's the helper methods:
public static List<T> RetrieveMany<T>(this CloudTable table, string partitionKey,
string propertyName, IEnumerable<string> valuesRange,
List<string> columnsToSelect = null)
where T : TableEntity, new()
{
var enitites = table.ExecuteQuery(new TableQuery<T>()
.Where(TableQuery.CombineFilters(
TableQuery.GenerateFilterCondition(
nameof(TableEntity.PartitionKey),
QueryComparisons.Equal,
partitionKey),
TableOperators.And,
GenerateIsInRangeFilter(
propertyName,
valuesRange)
))
.Select(columnsToSelect))
.ToList();
return enitites;
}
public static string GenerateIsInRangeFilter(string propertyName,
IEnumerable<string> valuesRange)
{
string finalFilter = valuesRange.NotNull(nameof(valuesRange))
.Distinct()
.Aggregate((string)null, (filterSeed, value) =>
{
string equalsFilter = TableQuery.GenerateFilterCondition(
propertyName,
QueryComparisons.Equal,
value);
return filterSeed == null ?
equalsFilter :
TableQuery.CombineFilters(filterSeed,
TableOperators.Or,
equalsFilter);
});
return finalFilter ?? "";
}
I have tested it for less than 100 values in rowKeysList
, however, if it even throws an exception if there are more, we can always split the request into parts.
Upvotes: 3
Reputation: 14309
I made my own ghetto link-table. I know it's not that efficient (maybe its fine) but I only make this request if the data is not cached locally, which only means switching devices. Anyway, this seems to work. Checking the length of the two arrays lets me defer the context.done();
var query = new azure.TableQuery()
.top(1000)
.where('PartitionKey eq ?', 'link-' + req.query.email.toLowerCase() );
tableSvc.queryEntities('linkUserMarker',query, null, function(error, result, response) {
if( !error && result ){
var markers = [];
result.entries.forEach(function(e){
tableSvc.retrieveEntity('markerTable', e.markerPartition._, e.RowKey._.toString() , function(error, marker, response){
markers.push( marker );
if( markers.length == result.entries.length ){
context.res = {
status:200,
body:{
status:'error',
markers: markers
}
};
context.done();
}
});
});
} else {
notFound(error);
}
});
Upvotes: 1
Reputation: 627
Not sure how I missed this in the first place, but here is a snippet from the MSDN documentation for the TableBatchOperation type:
A batch operation may contain up to 100 individual table operations, with the requirement that each operation entity must have same partition key. A batch with a retrieve operation cannot contain any other operations. Note that the total payload of a batch operation is limited to 4MB.
I ended up executing individual retrieve operations asynchronously as suggested by David Makogon.
Upvotes: 1
Reputation: 71110
With hundreds of row keys, that rules out using $filter
with a list of row keys (which would result in partial partition scan anyway).
With the error you're getting, it seems like the batch contains both queries and other types of operations (which isn't permitted). I don't see why you're getting that error, from your code snippet.
Your only other option is to execute individual queries. You can do these asynchronously though, so you wouldn't have to wait for each to return. Table storage provides upwards of 2,000 transactions / sec on a given partition, so it's a viable solution.
Upvotes: 1