Reputation: 105
Using SQL Server 2008.
(Sorry if this turns out to be an article but I'm trying to give as much info as possible.)
I have multiple locations which each contain multiple departments which each contain multiple Items which can have zero to many scans. Each scan relates to a specific operation which may or may not have a cutoff time. Each item also belongs to a specific package which belongs to a specific job which belongs to a specific project with belongs to a specific client. Each job contains one or more packages which contains one or more items.
+=============+ +=============+
| Projects | --> | Clients |
+=============+ +=============+
^
|
+=============+ +=============+
| Locations | | Jobs |
+=============+ +=============+
^ ^
| |
+=============+ +=============+ +=============+
| Departments | <-- | Items | --> | Packages |
+=============+ +=============+ +=============+
^
|
+=============+ +=============+
| Scans | --> | Operations |
+=============+ +=============+
There are roughly 24,000,000 records in the items table and roughly 48,000,000 records in the scans table. New items are sporadically bulk inserted into the database throughout the day, usually in the tens of thousands at a pop. New scans are bulk inserted every hour, anywhere from a few hundred to a few hundred thousand per.
These tables are heavily queried, sliced and diced every which way. I was writing very specific stored procs but it turned into a maintenance nightmare as I was on the verge of a hundred stored procs with no end in site (e.g. something akin to ScansGetDistinctCountByProjectIDByDepartmentIDGroupedByLocationID, ScansGetDistinctCountByPackageIDByDepartmentIDGroupedByLocationID, etc.) As luck would have it, the requirements change (what feels like) almost daily and every time I have to change/add/delete a column, well...I end up at the bar.
So I created an indexed view and a handful of generic stored procs with parameters to determine filtering and grouping. Unfortunately, performance went down the toilet. I guess the first question is, since select performance is paramount, should I just stick with the specific approach and fight through changes to the underlying tables? Or, can something be done to speed up the indexed view/generic query approach? On top of easing the maintenance nightmare, I was actually hoping that the indexed view would improve performance as well.
Here is the code to generate the view:
CREATE VIEW [ItemScans] WITH SCHEMABINDING AS
SELECT
p.ClientID
, p.ID AS [ProjectID]
, j.ID AS [JobID]
, pkg.ID AS [PackageID]
, i.ID AS [ItemID]
, s.ID AS [ScanID]
, s.DateTime
, o.Code
, o.Cutoff
, d.ID AS [DepartmentID]
, d.LocationID
-- other columns
FROM
[Projects] AS p
INNER JOIN [Jobs] AS j
ON p.ID = j.ProjectID
INNER JOIN [Packages] AS pkg
ON j.ID = pkg.JobID
INNER JOIN [Items] AS i
ON pkg.ID = i.PackageID
INNER JOIN [Scans] AS s
ON i.ID = s.ItemID
INNER JOIN [Operations] AS o
ON s.OperationID = o.ID
INNER JOIN [Departments] AS d
ON i.DepartmentID = d.ID;
and the clustered index:
CREATE UNIQUE CLUSTERED INDEX [IDX_ItemScans] ON [ItemScans]
(
[PackageID] ASC,
[ItemID] ASC,
[ScanID] ASC
)
Here's one of the generic stored procs. It gets a count of items that have been scanned and have a cutoff:
PROCEDURE [ItemsGetFinalizedCount]
@FilterBy int = NULL
, @ID int = NULL
, @FilterBy2 int = NULL
, @ID2 sql_variant = NULL
, @GroupBy int = NULL
WITH RECOMPILE
AS
BEGIN
SELECT
CASE @GroupBy
WHEN 1 THEN
CONVERT(sql_variant, LocationID)
WHEN 2 THEN
CONVERT(sql_variant, DepartmentID)
-- other cases
END AS [ID]
, COUNT(DISTINCT ItemID) AS [COUNT]
FROM
[ItemScans] WITH (NOEXPAND)
WHERE
(@ID IS NULL OR
@ID = CASE @FilterBy
WHEN 1 THEN
ClientID
WHEN 2 THEN
ProjectID
-- other cases
END)
AND (@ID2 IS NULL OR
@ID2 = CASE @FilterBy2
WHEN 1 THEN
CONVERT(sql_variant, ClientID)
WHEN 2 THEN
CONVERT(sql_variant, ProjectID)
-- other cases
END)
AND Cutoff IS NOT NULL
GROUP BY
CASE @GroupBy
WHEN 1 THEN
CONVERT(sql_variant, LocationID)
WHEN 2 THEN
CONVERT(sql_variant, DepartmentID)
-- other cases
END
END
The first time I ran the query and looked at the actual execution plan, I created the missing index that it suggested:
CREATE NONCLUSTERED INDEX [IX_ItemScans_Counts] ON [ItemScans]
(
[Cutoff] ASC
)
INCLUDE ([ClientID],[ProjectID],[JobID],[ItemID],[SegmentID],[DepartmentID],[LocationID])
Creating the index took the execution time down to about five seconds but that is still unacceptable (the "specific" version of the query runs subsecond.) I've tried adding different columns to the index instead of just including them with no gain in performance (doesn't really help that I have no idea what I am doing at this point.)
Here is the query plan:
And here are the details for that first index seek (it appears to return all of the rows in the view where Cutoff IS NOT NULL):
Upvotes: 4
Views: 3080
Reputation: 15251
A generic proc may not be a bad idea in this case, but you don't have to put all those cases into the final query as you are currently doing. I would try building your "specific queries" using dynamic SQL in your generic proc, much the same way Gail Shaw builds a "catch-all" query here:
SQL in the Wild - Catch-all queries
This way, you're able to cache query plans and utilize indexes as shown in the blog post, and you should be able to get that same sub-second performance that you're after.
Upvotes: 3