Reputation: 407
I have quite common task to show history of sales in our web application (asp.net + mssql). I have a table with sale transactions like:
- SellerID (string)
- Product PartNumber
- Product ManufacturerName
- ProductID (string uniq normalized PN+MN)
- Date of sale
- Price
- Qty
- Option 1
- Option 2
- Option 3
Options are some specific attributes (like contract number etc).
I need to display sales data grouped by productID with aggregated sum of Qty and Amount. Also i need to give an ability to filter by SellerId, Date and Options. So user should see table:
- Part Number
- Manufacturer Name
- Sum(Qty)
- Sum(Price)
Also user can sort and filter by displayed columns and go over the pages
At the present we are having approx 5M records of sales, and «direct» query with such grouping,filtering and sorting takes too much time (and it's not thinking about that this web service can be used by several concurrent users).
For making it work faster, we used to create a cache key by all criterias being used in query and copy whole result of the query in cache table with the same scheme (plus cache key). But there are some disadvantages like rapid growing of the cache table, and difficulties with creating indecies in the cache table (it slowes down inserting)
Im pretty sure that this task is very common and famouse for most of business applications working with sales.
How do people solve all these issues?
UPD: I forgot to mention.
There are no inserts of sales data (we used to load it manually once in a quarter)
I was thinking about olap, but have never worked with it in real. Does it make sense to use olap?
We are not strongly constrained with SQL Server, if it makes sense we can use any other database
Upvotes: 1
Views: 2411
Reputation: 1271231
The solution to your problem depends on the mix of queries and structure of your data.
For what you are describing, the natural format would be a star schema with a fact table in the middle. However, the fact table is probably pretty close to what you have now. The difference is the size of the record. A fact record would move much of the "readable" content to reference tables, so each record is as small as possible. It might look like:
Things like:
would be in a reference table.
This, in itself, may get the fact table down to a size reasonable for your cache.
Next, start building indexes on it. You probably want several indexes, depending on the filtering criteria: (date, salesid, productid, option1, option2, option3)
, (productid, date)
, and some more. I realize that indexes do require additional effort on inserts. The impact depends on the number of inserts per day. For a decision support system, you can should be able to live with a "data lag", where you update the data periodically. Batching the inserts helps on the index build overhead.
If your requirements are for real-time reporting, then consider partitioning the data, so the most recent data is in a small partition. Partitioned indexes are smaller, so the overhead for inserting should be smaller.
And, if your requirements really are heavy -- lots of real-time updates per minute, lots of real-time reporting slicing and dicing, and lots of queries requiring full history -- then invest in more memory so the tables will easily fit into memory. Along the way, you can optimize the central data structure so it consists of ids and numeric quantities, with reference tables holding additional data. Joins on primary keys will be faster than storing the data, that would otherwise be many time bigger.
Upvotes: 2