Tamas
Tamas

Reputation: 6420

How to estimate Windows Azure Table storage query performance?

I'd like to evaluate how my Windows Azure Table store queries scale. For this purpose, I've put together a simple test environment, where I can increase the amount of data in my table, and measure the execution times of the queries. And based on the times I'd like to define a cost function that could be used to evaluate the performance of future queries.

I've evaluated the following queries:

  1. Query with PartitionKey and RowKey
  2. Query with PartitionKey and an attribute
  3. Query with PartitionKey and two RowKeys
  4. Query with PartitionKey and two attributes

For the last two queries I've checked the following two patterns:

  1. PartitionKey == "..." && (RowKey == "..." || RowKey == "...")
  2. (PartitionKey == "..." && RowKey == "...") || (PartitionKey == "..." && RowKey == "...")

To minimize the transfer delay, I've executed the test on an Azure instance. From the measurements, I can see that

Can you explain the internals of the query/filter interpreter? Even if we accept that query 3.1 needs a partition scan, query 4.1 could also be evaluated with the same logic (and under the same time). Query 3.2 and 4.2 seems like a mystery for me. Any pointers on those?

Obviously the whole point to this is that I'd like to query distinct elements within one query to minimize cost meanwhile not losing performance. But it seems like using separate queries (with Task Parallel Library) for each element is the only real fast solution. What is the accepted way of doing this?

Upvotes: 4

Views: 1818

Answers (2)

AvkashChauhan
AvkashChauhan

Reputation: 20576

With query like 3.2 and 4.2 there will be full partition scan one by one along with attributes. Query will not run in parallel even when these partitions are on two separate machines, and that's why you see such long time in execution. This is because Windows Azure does not have query optimization with the queries. It is code responsibility to write in a way so they can run in parallel.

You are right if you want to have faster performance, you would nee to run the query in parallel using Task Parallel Libraries to achieve higher performance.

Upvotes: 2

Ming Xu - MSFT
Ming Xu - MSFT

Reputation: 2116

Since the details of table storage internal implementation is non-public, if you want to evaluate the performance of future queries, I would like to suggest you to check http://blogs.msdn.com/b/windowsazurestorage/archive/2010/11/06/how-to-get-most-out-of-windows-azure-tables.aspx for some best practices.

Best Regards,

Ming Xu.

Upvotes: 1

Related Questions