How to speed up SQL Server query with grouping and filtering

Question

I have quite common task to show history of sales in our web application (asp.net + mssql). I have a table with sale transactions like:

 - SellerID (string)
 - Product PartNumber
 - Product ManufacturerName
 - ProductID (string uniq normalized PN+MN)
 - Date of sale
 - Price
 - Qty
 - Option 1
 - Option 2
 - Option 3

Options are some specific attributes (like contract number etc).

I need to display sales data grouped by productID with aggregated sum of Qty and Amount. Also i need to give an ability to filter by SellerId, Date and Options. So user should see table:

 - Part Number
 - Manufacturer Name
 - Sum(Qty)
 - Sum(Price)

Also user can sort and filter by displayed columns and go over the pages

At the present we are having approx 5M records of sales, and «direct» query with such grouping,filtering and sorting takes too much time (and it's not thinking about that this web service can be used by several concurrent users).

For making it work faster, we used to create a cache key by all criterias being used in query and copy whole result of the query in cache table with the same scheme (plus cache key). But there are some disadvantages like rapid growing of the cache table, and difficulties with creating indecies in the cache table (it slowes down inserting)

Im pretty sure that this task is very common and famouse for most of business applications working with sales.

How do people solve all these issues?

UPD: I forgot to mention.

There are no inserts of sales data (we used to load it manually once in a quarter)
I was thinking about olap, but have never worked with it in real. Does it make sense to use olap?
We are not strongly constrained with SQL Server, if it makes sense we can use any other database

Gordon Linoff · Accepted Answer

The solution to your problem depends on the mix of queries and structure of your data.

For what you are describing, the natural format would be a star schema with a fact table in the middle. However, the fact table is probably pretty close to what you have now. The difference is the size of the record. A fact record would move much of the "readable" content to reference tables, so each record is as small as possible. It might look like:

SellerID -- integer id
ProductID -- integer id
Date of sale
Price
Qty
Option 1 -- smallint
Option 2 -- smallint
Option 3 -- smallint

Things like:

Product PartNumber
Product ManufacturerName
Seller Name

would be in a reference table.

This, in itself, may get the fact table down to a size reasonable for your cache.

Next, start building indexes on it. You probably want several indexes, depending on the filtering criteria: (date, salesid, productid, option1, option2, option3), (productid, date), and some more. I realize that indexes do require additional effort on inserts. The impact depends on the number of inserts per day. For a decision support system, you can should be able to live with a "data lag", where you update the data periodically. Batching the inserts helps on the index build overhead.

If your requirements are for real-time reporting, then consider partitioning the data, so the most recent data is in a small partition. Partitioned indexes are smaller, so the overhead for inserting should be smaller.

And, if your requirements really are heavy -- lots of real-time updates per minute, lots of real-time reporting slicing and dicing, and lots of queries requiring full history -- then invest in more memory so the tables will easily fit into memory. Along the way, you can optimize the central data structure so it consists of ids and numeric quantities, with reference tables holding additional data. Joins on primary keys will be faster than storing the data, that would otherwise be many time bigger.

How to speed up SQL Server query with grouping and filtering

Answers (1)

Related Questions