Azure SQL Server vs Table Storage performance

Question

I have read a lot about the performance for Azure Table Storage. My take away is , if only using partionkey Azure Table Storage is lightning fast. To test it I created two scenarios.

A SQL server on a basic price tier with 13 million rows. All columns are indexed. A table storage with 120000 rows. Both with same entity:

public class Item
{
    [Key]
    public int Id { get; set; }

    public string Path { get; set; }

    public bool Deleted { get; set; }

    public int JobId { get; set; }

    public DateTime Started { get; set; }

    public int DurationInMS { get; set; }

    public int Status { get; set; }
}

When I query the SQL server it returns 706326 rows in 28 seconds.

When I query table storage on the partion key it returns 100000 rows in 36,5 seconds.

I would expect table storage to be much faster. Specially because the table has much less data, and I am only using the partion key. Is SQL server really faster? I am surprised because of most articles stating Table Storage is so fast.

SQL Server query EF:

 var db = new CleanupDB(_config.DBConnection);

 var sw = new Stopwatch();
 sw.Start();
 var dd = db.Items.Where(p => p.JobId == 4).ToList();
 sw.Stop();
 var ms = sw.Elapsed.TotalMilliseconds;

Table storage query:

 CloudTable table = tableClient.GetTableReference("items");

 var q = from s in table.CreateQuery()
 where s.PartitionKey == "1"
 select s.JobId;

 var sw = new Stopwatch();
 sw.Start();
 var ee = q.ToList();
 sw.Stop();
 var ms = sw.Elapsed.TotalMilliseconds;

Any of you have other experience? I am missing something here or might SQL server just be faster? I think this specific scenario should favor Table Storage.

Ivan Glasenberg · Accepted Answer

Just for azure table storage(I'm not familiar with azure sql), it's not a good practice to query table storage by only using partition_key and it will execute partition scan which will take a little more time.

For azure table query, the performance from good to bad is: Point Query -> Range Query -> Partition Scan -> Table Scan.

The details are as below(you can also find it from this doc):

Point Query: A Point Query is the most efficient lookup to use and is recommended to be used for high-volume lookups or lookups requiring lowest latency. Such a query can use the indexes to locate an individual entity very efficiently by specifying both the PartitionKey and RowKey values. For example: $filter=(PartitionKey eq 'Sales') and (RowKey eq '2')

Range Query: It uses the PartitionKey and filters on a range of RowKey values to return more than one entity. The PartitionKey value identifies a specific partition, and the RowKey values identify a subset of the entities in that partition. For example: $filter=PartitionKey eq 'Sales' and RowKey ge 'S' and RowKey lt 'T'

Partition Scan: It uses the PartitionKey and filters on another non-key property and that may return more than one entity. The PartitionKey value identifies a specific partition, and the property values select for a subset of the entities in that partition. For example: $filter=PartitionKey eq 'Sales' and LastName eq 'Smith'

Table Scan: It does not include the PartitionKey and is very inefficient because it searches all of the partitions that make up your table in turn for any matching entities. It will perform a table scan regardless of whether or not your filter uses the RowKey. For example: $filter=LastName eq 'Jones'

Azure SQL Server vs Table Storage performance

Answers (1)

Related Questions