Andrew Bezzub
Andrew Bezzub

Reputation: 16032

Query azure table by collection of row keys

I need to do a look up of several entities by collection of row keys (in one partition). What is the proper query to do it?

Upvotes: 4

Views: 3977

Answers (3)

CageE
CageE

Reputation: 465

You can create a filter string in runtime and run ExecuteQuery or ExecuteQuerySegmentedAsync in case of async. For example in C#:

string queryFilter = $"(PartitionKey eq '{<PK name>}') and" +
                $"({string.Join(" or", <YOUR LIST>.Select(tl => $"(RowKey eq '{<the property>}')"))})";

var query = new TableQuery<ExternalTranslationEntity>().Where(queryFilter).Take(<e.g yourList>.Count).
                Select(new List<string> { nameof(<if you want specific columns>) });

            TableContinuationToken? token = null;
            do
            {
                var segment = await _translationsTable.ExecuteQuerySegmentedAsync(query, token);

                if (segment.Results.Any())
                {
                    segment.Results.ForEach(r =>
                    {
                        //do whaterver you want
                    });
                }
                token = segment.ContinuationToken;
            } while (token != null);

Upvotes: 0

Brian Reischl
Brian Reischl

Reputation: 7356

It depends on what you want to optimize for. It turns out that specifying multiple rowkeys, even if they are all in the same partition, will result in a partition scan. The query optimizer just isn't good enough to handle OR queries. A partition scan can take tens to hundreds of milliseconds, depending on the size of the partition. It is always slower than point queries.

If you want to optimize for speed, you should do each query separately. Don't use task parallel library, use the begin/end functions, they scale much better.

If latency is not a concern, do an OR query. It'll be slower, but it will count as one transaction so it will be cheaper.

Upvotes: 5

David Makogon
David Makogon

Reputation: 71130

The issue with querying only by rowkey (which I'm interpreting the original question to be alluding to): You'll end up doing a table scan, as that rowkey could exist in any partition. And, if you executed those queries individually, you'd end up doing a table scan for each (even with Task Parallel Library, as suggested by @GlennFerrieLive in a comment to the original question).

You could specify a range for the rowkey with $filter (as explained in this article), or a discrete list of row keys (limited to 15 individual comparisons within the filter). This should end up with just one table scan, but still... a table scan.

If at all possible to specify a partitionkey in your query, you should do so, as it will make your queries return much faster. Ok, much faster is relative, as I have no idea of the quantity of data you're storing.

EDIT: Per update via comment, since you know partitionkey, you could follow the guidance above specifying either a rowkey range or discrete rowkeys within a single filter. Or... if you have many more rowkeys, you could consider execute these via TPL (which now makes sense given there's no table scan), either as single rowkey per filter or grouping into ranges or filtered list.

Upvotes: 3

Related Questions