SQL Server Full Table Scan and Load

Question

For the purpose of this question, let's pretend I have the following table:

Transaction:

Id
ProductId
ProductName
City
State
Country
UnitCost
SellAmount
NumberOfTimesPurchased
Profit (NumberOfTimesPurchased * (SellAmount - UnitCost))

Basically, a single de-normalized table with a million plus rows in it. It is important to note that only two columns will ever by updated: Profit and NumberOfTimesPurchased. When a sale is made, the NumberOfTimesPurchased will be updated and the new profit amount will be re-calculated.

Now, I need to do some minimal reporting on this table, which consists of queries that aggregate and group. As an example:

SELECT 
    City, AVG(UnitCost), AVG(SellAmount), 
    SUM(NumberOfTimesPurchased), AVG(Profit) 
FROM 
    Transaction 
GROUP BY  
    City

SELECT 
    State, AVG(UnitCost), AVG(SellAmount), SUM(NumberOfTimesPurchased), 
    AVG(Profit) 
FROM 
    Transaction 
GROUP BY 
    State

SELECT 
    Country, AVG(UnitCost), AVG(SellAmount), SUM(NumberOfTimesPurchased), 
    AVG(Profit) 
FROM 
    Transaction 
GROUP BY 
    Country

SELECT 
    ProductId, ProductName, AVG(UnitCost), AVG(SellAmount), 
    SUM(NumberOfTimesPurchased), AVG(Profit) 
FROM 
    Transaction
GROUP BY 
    ProductId, ProductName

These queries are quick: ~1 second. However, I've noticed that under load, performance significantly drops (from 1 second up to a minute when there are 20+ concurrent requests), and I'm guessing the reason is that each query performs a full table scan.

I've attempted to use indexed views for each query, however my update statement performance takes a beating since each view needs to be rebuilt. On the same note, I've attempted to create covering indexes for each query, but again my update statement performance is not acceptable.

Assuming full table scans are the culprit, do I have any realistic options to get the query time down while keeping update performance at acceptable levels?

Note that I cannot use column store indexes (I'm using the cheaper version of Azure SQL Database). I'd also like to stay away from any sort of roll-up implementation, as I need the data available immediately.

Finally - the example above is not a completely accurate representation of my table. I have 20 or so different columns that can be 'grouped', and 6 columns that can be updated. No inserts or deletes.

wBob · Accepted Answer

Because there are no WHERE clauses on your queries, the database engine can nothing but a table scan (or clustered index scan which is really the same thing). If there were covering indexes with containing all the columns from your query, then the engine would prefer those. If your real queries have WHERE clauses, then appropriate indexing with those columns as the leading columns of the index might help.

But I think your problem lies elsewhere. As far as concurrency goes you haven't put enough money in the meter. According to the main service tiers doc, the Basic tier for Azure SQL Database is for:

... supporting typically one single active operation at a given time. Examples include databases used for development or testing, or small-scale infrequently used applications.

Therefore you might want to think about splashing out for Premium edition to support both your concurrency requirement and columnstore indexes, which are perfectly suited to this type of query. Just for fun, I created a test-rig based on AdventureWorksDW2012 to try and recreate your problem which is here. Query performance was atrocious (> 20 secs). I'd be surprised if you weren't getting DTU warnings on your portal:

An upgrade to Standard (S0-S2) did boost performance so you should experiment. You could look at scaling up for busy query times and down when not required.

This table also looks a bit like a fact table, so you might want to consider refactoring this as a fact / dimensional model then use Azure Analysis Services on top to bring that sub-second performance.

Coincidentally there is a feedback item you can vote for to bring columnstore to Standard tier:

https://feedback.azure.com/forums/217321-sql-database/suggestions/6878001-make-sql-column-store-feature-available-for-standa

Recent comments suggest it is "in the work queue" as at May 2017;

SQL Server Full Table Scan and Load

Answers (1)

Related Questions