Reputation: 105
Using SQL Server 2008.
I have multiple Locations which each contain multiple Departments which each contain multiple Items which can have zero to many Scans. Each Scan relates to a specific Operation which may or may not have a cutoff time. Each Item also belongs to a specific Package which belongs to a specific Job. Each job contains one or more Packages which contains one or more items.
+=============+ +=============+
| Locations | | Jobs |
+=============+ +=============+
^ ^
| |
+=============+ +=============+ +=============+
| Departments | <-- | Items | --> | Packages |
+=============+ +=============+ +=============+
^
|
+=============+ +=============+
| Scans | --> | Operations |
+=============+ +=============+
What I am attempting to do is get a count of Scans for a Job grouped by Location and Scan date. The tricky part is that I only want to count the first Scan by date/time per Item where the cutoff time for the Operation is not null. (NOTE: the scans definitely will NOT be in date/time order in the table.)
The query I have is getting me the correct results but it is painfully slow when the number of Items for a Job reaches 75,000 or so. I am pushing for a new server -- I know our hardware is lacking -- but I am wondering if there is something I am doing in the query that is bogging it down as well.
From what little I can glean from the execution plan, most of the cost of the query seems to be in the sub-query to find the first Scan for each Item. It does an index scan (0%) on an Operations table index (ID, Cutoff) and then a lazy spool (19%). It does an index seek (61%) on a Scans table index (ItemID, DateTime, OperationID, ID). The subsequent nested loops (inner join) is only 2% and the Top operator is 0%. (Not that I really understand much of what I just typed but I am trying to provide as much info as possible...)
Here is the query:
SELECT
Departments.LocationID
, DATEADD(dd, 0, DATEDIFF(dd, 0, Scans.DateTime))
, COUNT(Scans.ItemID) AS [COUNT]
FROM
Items
INNER JOIN Scans
ON Scans.ID =
(
SELECT TOP 1
Scans.ID
FROM
Scans
INNER JOIN Operations
ON Scans.OperationID = Operations.ID
WHERE
Operations.Cutoff IS NOT NULL
AND Scans.ItemID = Items.ID
ORDER BY
Scans.DateTime
)
INNER JOIN Operations
ON Scans.OperationID = Operations.ID
INNER JOIN Packages
ON Items.PackageID = Packages.ID
INNER JOIN Departments
ON Items.DepartmentID = Departments.ID
WHERE
Packages.JobID = @ID
GROUP BY
Departments.LocationID
, DATEADD(dd, 0, DATEDIFF(dd, 0, Scans.DateTime));
Which will return a sampling of results like so:
8 2012-06-08 00:00:00.000 11842
21 2012-06-07 00:00:00.000 502
11 2012-06-12 00:00:00.000 1841
15 2012-06-11 00:00:00.000 4314
16 2012-06-09 00:00:00.000 278
23 2012-06-12 00:00:00.000 1345
6 2012-06-06 00:00:00.000 2005
20 2012-06-08 00:00:00.000 352
14 2012-06-07 00:00:00.000 2408
8 2012-06-11 00:00:00.000 290
19 2012-06-10 00:00:00.000 85
20 2012-06-11 00:00:00.000 5484
7 2012-06-10 00:00:00.000 2389
16 2012-06-06 00:00:00.000 6762
18 2012-06-09 00:00:00.000 4473
14 2012-06-10 00:00:00.000 2364
1 2012-06-11 00:00:00.000 1531
22 2012-06-08 00:00:00.000 14534
5 2012-06-10 00:00:00.000 11908
9 2012-06-12 00:00:00.000 47
19 2012-06-07 00:00:00.000 559
7 2012-06-07 00:00:00.000 2576
Here's the execution plan (not sure what I changed since the original post but the cost % are slightly different. The bottleneck still seems to be in the same area though):
Upvotes: 1
Views: 5178
Reputation: 105
I am a little leery about marking this as the answer as I am sure we can still squeeze a little juice out of the query. But this did knock my test run from 22 seconds down to 6 seconds (with an added index on Scans: OperationID including DateTime and ItemID):
WITH CTE AS
(
SELECT
Items.ItemID AS ID
, Scans.DateTime AS [DateTime]
, Operations.Cutoff AS Cutoff
, ROW_NUMBER() OVER (PARTITION BY Items.ID ORDER BY Scans.DateTime) AS RN
FROM
Items
INNER JOIN Scans
ON Items.ID = Scans.ItemID
INNER JOIN Operations
ON Scans.OperationID = Operations.ID
INNER JOIN Packages
ON Items.PackageID = Packages.ID
WHERE
Operations.Cutoff IS NOT NULL
AND Packages.JobID = @ID
)
SELECT
Departments.LocationID
, CTE.DateTime
, COUNT(Items.ID) AS COUNT
FROM
Items
INNER JOIN CTE
ON Items.ID = CTE.ID
AND CTE.RN = 1
INNER JOIN Packages
ON Items.PackageID = Packages.ID
INNER JOIN Departments
ON Items.DepartmentID = Departments.ID
WHERE
Packages.JobID = @ID
GROUP BY
Departments.LocationID
, CTE.DateTime
Upvotes: 1
Reputation: 19346
I'm curious - could you please run CROSS APPLY version?
SELECT
Departments.LocationID
, DATEADD(dd, 0, DATEDIFF(dd, 0, CA_Scans.DateTime))
, COUNT(CA_Scans.ItemID) AS [COUNT]
FROM
Items
CROSS APPLY
(
SELECT TOP 1
Scans.ID,
Scans.OperationID,
Scans.DateTime
FROM
Scans
INNER JOIN Operations
ON Scans.OperationID = Operations.ID
WHERE
Operations.Cutoff IS NOT NULL
AND Scans.ItemID = Items.ID
ORDER BY
Scans.DateTime
) CA_Scans
INNER JOIN Operations
ON CA_Scans.OperationID = Operations.ID
INNER JOIN Packages
ON Items.PackageID = Packages.ID
INNER JOIN Departments
ON Items.DepartmentID = Departments.ID
WHERE
Packages.JobID = @ID
GROUP BY
Departments.LocationID
, DATEADD(dd, 0, DATEDIFF(dd, 0, CA_Scans.DateTime));
Upvotes: 0
Reputation: 51319
Its hard to say for sure, but something like this may behave better. I replaced your nested lookup with a ROW_NUMBER call. The problem in your original query is that nested lookup- its killing you.
Note I don't have SQL in front of me, so I cannot test it, but I think it is logically equivalent.
SELECT
Departments.LocationID
, DATEADD(dd, 0, DATEDIFF(dd, 0, Scans.DateTime))
, COUNT(Scans.ItemID) AS [COUNT]
FROM
Items
INNER JOIN Scans
ON Scans.ItemID = Items.ID
INNER JOIN Operations
ON Scans.OperationID = Operations.ID
INNER JOIN Packages
ON Items.PackageID = Packages.ID
INNER JOIN Departments
ON Items.DepartmentID = Departments.ID
WHERE
Operations.Cutoff IS NOT NULL
AND
Packages.JobID = @ID
AND
ROW_NUMBER () OVER (PARTITION BY Items.ID ORDER BY Scans.DateTime) = 1
GROUP BY
Departments.LocationID
, DATEADD(dd, 0, DATEDIFF(dd, 0, Scans.DateTime));
Upvotes: 0