Reputation: 5213
I have read a lot about the performance for Azure Table Storage. My take away is , if only using partionkey Azure Table Storage is lightning fast. To test it I created two scenarios.
A SQL server on a basic price tier with 13 million rows. All columns are indexed. A table storage with 120000 rows. Both with same entity:
public class Item
{
[Key]
public int Id { get; set; }
public string Path { get; set; }
public bool Deleted { get; set; }
public int JobId { get; set; }
public DateTime Started { get; set; }
public int DurationInMS { get; set; }
public int Status { get; set; }
}
When I query the SQL server it returns 706326 rows in 28 seconds.
When I query table storage on the partion key it returns 100000 rows in 36,5 seconds.
I would expect table storage to be much faster. Specially because the table has much less data, and I am only using the partion key. Is SQL server really faster? I am surprised because of most articles stating Table Storage is so fast.
SQL Server query EF:
var db = new CleanupDB(_config.DBConnection);
var sw = new Stopwatch();
sw.Start();
var dd = db.Items.Where(p => p.JobId == 4).ToList();
sw.Stop();
var ms = sw.Elapsed.TotalMilliseconds;
Table storage query:
CloudTable table = tableClient.GetTableReference("items");
var q = from s in table.CreateQuery<ItemItemEntity>()
where s.PartitionKey == "1"
select s.JobId;
var sw = new Stopwatch();
sw.Start();
var ee = q.ToList();
sw.Stop();
var ms = sw.Elapsed.TotalMilliseconds;
Any of you have other experience? I am missing something here or might SQL server just be faster? I think this specific scenario should favor Table Storage.
Upvotes: 2
Views: 1690
Reputation: 29940
Just for azure table storage(I'm not familiar with azure sql), it's not a good practice to query table storage by only using partition_key
and it will execute partition scan
which will take a little more time.
For azure table query, the performance from good to bad is: Point Query -> Range Query -> Partition Scan -> Table Scan.
The details are as below(you can also find it from this doc):
Point Query: A Point Query is the most efficient lookup to use and is recommended to be used for high-volume lookups or lookups requiring lowest latency. Such a query can use the indexes to locate an individual entity very efficiently by specifying both the PartitionKey and RowKey values. For example: $filter=(PartitionKey eq 'Sales') and (RowKey eq '2')
Range Query: It uses the PartitionKey and filters on a range of RowKey values to return more than one entity. The PartitionKey value identifies a specific partition, and the RowKey values identify a subset of the entities in that partition. For example: $filter=PartitionKey eq 'Sales' and RowKey ge 'S' and RowKey lt 'T'
Partition Scan: It uses the PartitionKey and filters on another non-key property and that may return more than one entity. The PartitionKey value identifies a specific partition, and the property values select for a subset of the entities in that partition. For example: $filter=PartitionKey eq 'Sales' and LastName eq 'Smith'
Table Scan: It does not include the PartitionKey and is very inefficient because it searches all of the partitions that make up your table in turn for any matching entities. It will perform a table scan regardless of whether or not your filter uses the RowKey. For example: $filter=LastName eq 'Jones'
Upvotes: 1