Reputation: 45095

Azure Table Storage Parallel Query

If I don't specify a partition key and thus perform a scan of all partitions, do the scans automatically take place in parallel with each partition scanned concurrently?

Thanks.

Upvotes: 1

Answers (4)

Luke Puplett

Reputation: 45095

Months later, I want to post an answer to discuss the performance impact of paralellising whole-table scans.

I used a 128-partition scheme using a key generation algorithm with excellent distribution given a Guid row key seed value.

Empirical tests showed that a single-threaded query can perform far, far better in some situations. The table size and, I assume, how Azure has distributed the parts, seems to make a difference.

In short, its an area that needs checking during product lifetime to see if a different strategy will improve performance.

So what I have done is placed an expected duration in automated tests against the tables so that any degradation can flash a red light to go check again.

Upvotes: 0

Herve Roggero

Reputation: 5249

As Gaurav says, it's not automatic. But that doesn't mean it can't be done.

You can execute against Azure Tables in parallel very easily if you can make certain assumptions on your PartitionKey. For example if your PartitionKey is a GUID, you can start 10 threads for example by searching for your data in ranges. Here is an example of the range you would use on the first thread, retrieving all entities in the range [a, e[. Note that you can tune this as needed and run 20 threads if you want.

(PartitionKey ge 'a' and PartitionKey lt 'e')

If instead of GUIDs you use a non-unique value, let's say a list of countries, you would simply start as many threads as you have countries.

The only case you really need to scan your entire Azure Table is when the PartitionKey is the same for all the entities, in which case you are probably facing a design issue.

Upvotes: 1

Jaxidian

Reputation: 13511

Gaurav Mantri is correct.

If you want to force it to be done in parallel, you'll have to filter by all possible PartitionKeys and then perform those queries in parallel yourself in code. This may or may not be "better" (faster/easier/cheaper) as it will depend on quite a few different things.

Ultimately, I wouldn't advise this for the typical situation. It's probably better that you organize your data differently.

Upvotes: 2

Gaurav Mantri

Reputation: 136136

The scan is done sequentially starting from the 1st Partition as entities are stored by PartitionKey/RowKey combination.

Upvotes: 5

Azure Table Storage Parallel Query

Answers (4)

Related Questions