jAntoni
jAntoni

Reputation: 621

Performance impact with huge data on Azure Table Storage

My Query

Should we expect slowness in retrieving data from storage within different Partitions in a specified period of time - say 1 hour - if the data in the Table Storage`Partitions within Table Storage` is expected to be very very very huge (say in millions)?

About My App

  1. My web app deals with receiving data for different signals from different devices.

  2. The frequency of data to be received from devices can be 1 min.

  3. The data thus received will be posted to Table Storage and displayed as and when received on a Dashboard.

  4. The data pertaining to a particular signal(s) within a selected period of time can also be queried to be displayed on page.

My problem

Currently the app is in testing and data comes in only whenever testing is happening. With this less amount of data, to query and fetch data from Table Storage it takes ~30 secs to fetch ~10,000 rows.

I have been reading here in different posts like Very Slow on Azure Table Storage Query on PartitionKey/RowKey List that says there is some delay in getting data from Table Storage.

So my query is

  1. When there are millions of data in a Table Storage \ within a Partition will a query to Table Storage make a complete table scan leading to heavy performance issue?

    1. One of my expected queries to retrieve data to display on my page is

    (((((((((((((PartitionKey eq 'D4AS1') or (PartitionKey eq 'D4AS2')) or (PartitionKey eq 'D4AS3')) or (PartitionKey eq 'D4AS4')) or (PartitionKey eq 'D4AS5')) or (PartitionKey eq 'D4AS6')) or (PartitionKey eq 'D4AS7')) or (PartitionKey eq 'D4AS8')) or (PartitionKey eq 'D4AS9')) or (PartitionKey eq 'D4AS10')) or (PartitionKey eq 'D4AS11')) or (PartitionKey eq 'D4AS133')) and (TimeReceived ge datetime'2018-02-21T23:53:40.4622407Z')) and (TimeReceived le datetime'2018-02-22T23:53:40.4622407Z') Should the above query be re framed for better performance? If so please suggest in what way it needs to be addressed?

    1. Whats the maximum delay we can expect on querying (simple\complex as above) Table Storage?

Upvotes: 0

Views: 1743

Answers (1)

Gaurav Mantri
Gaurav Mantri

Reputation: 136336

When there are millions of data in a Table Storage \ within a Partition will a query to Table Storage make a complete table scan leading to heavy performance issue?

If your query include PartitionKey, then it will not do table scan. It will however do a partition scan (unless you have issued a point query using both PartitionKey and RowKey).

One of my expected queries to retrieve data to display on my page is (((((((((((((PartitionKey eq 'D4AS1') or (PartitionKey eq 'D4AS2')) or (PartitionKey eq 'D4AS3')) or (PartitionKey eq 'D4AS4')) or (PartitionKey eq 'D4AS5')) or (PartitionKey eq 'D4AS6')) or (PartitionKey eq 'D4AS7')) or (PartitionKey eq 'D4AS8')) or (PartitionKey eq 'D4AS9')) or (PartitionKey eq 'D4AS10')) or (PartitionKey eq 'D4AS11')) or (PartitionKey eq 'D4AS133')) and (TimeReceived ge datetime'2018-02-21T23:53:40.4622407Z')) and (TimeReceived le datetime'2018-02-22T23:53:40.4622407Z') Should the above query be re framed for better performance? If so please suggest in what way it needs to be addressed?

As mentioned by Steve in the answer you have linked, this query is not really optimized. You should create multiple queries and fire them in parallel. Once the results from all queries come back, you should combine them on the client side and present it to your user.

Whats the maximum delay we can expect on querying (simple\complex as above) Table Storage?

From this link, maximum time allotted for execution of a query is 5 seconds and maximum time allotted for the query is 30 seconds. From this link:

A query against the Table service may return a maximum of 1,000 items at one time and may execute for a maximum of five seconds. If the result set contains more than 1,000 items, if the query did not complete within five seconds, or if the query crosses the partition boundary, the response includes headers which provide the developer with continuation tokens to use in order to resume the query at the next item in the result set. Continuation token headers may be returned for a Query Tables operation or a Query Entities operation.

Note that the total time allotted to the request for scheduling and processing the query is 30 seconds, including the five seconds for query execution.

Upvotes: 1

Related Questions