Reputation: 1
I find myself needing to retrieve matches for, on average, ~1.5m rows from a remote database. There are two tables (ITEM1 and ITEM2) that have dated item information. There should always be at least one record for an item in ITEM1, and there may be 0 to many records for the same item in ITEM2. I have to find the latest record from either table, and if it exists in ITEM2, use that information instead of ITEM1. #TEMPA is the table that has the initial ~1.5m ItemNumbers.
Below is the query:
SELECT GETDATE() AS DateElement, A.SourceStore, COALESCE(FR.original_cost,CO.original_cost) AS Cost
FROM #TEMPA A
INNER JOIN REMOTEDB.ITEM1 CO
ON CO.item_id = A.ItemNumber
AND CO.month_ending >= (SELECT MAX(month_ending) FROM REMOTEDB.ITEM1 CO2 WHERE CO2.item_id = A.ItemNumber)
LEFT JOIN REMOTEDB.ITEM2 FR
ON FR.item_id = A.ItemNumber
AND FR.month_ending >= (SELECT MAX(month_ending) FROM REMOTEDB.ITEM2 FR2 WHERE FR2.item_id = A.ItemNumber)
WHERE CO.item_id IS NOT NULL
OR FR.item_id IS NOT NULL
There are unique clustered indexes on item_id and month_ending on both ITEM tables. I realize the subqueries are probably a big performance hit, but I can't think of any other way to do it. Each item could potentially have a different max month_ending date. Currently it returns the correct information, but it takes ~2.6 hrs to do so. Any help in optimizing this query to perform better would be appreciated.
Edit: I should mention the query is also being run READ UNCOMMITTED already.
I tried both answer queries using ROW_NUMBER and they both ran in ~20 minutes on the remote server itself. Using my original query it finishes in ~2 minutes. My original query runs in ~17 minutes over linked server. I cancelled the other queries once they went over an hour.
Thoughts?
Answer Queries: http://content.screencast.com/users/CWhittem/folders/Jing/media/ed55352b-9799-4dec-94f0-764e2670884f/2014-07-09_0957.png
Original Query: http://content.screencast.com/users/CWhittem/folders/Jing/media/4991aa7d-a05c-4fb1-afad-52b07f896d5e/2014-07-09_1014.png
Thanks!
Upvotes: 0
Views: 475
Reputation: 1
So after much testing and experimentation I have come up with the following that outperforms everything else I have tried:
SELECT DISTINCT oInv.Item_ID, oInv.Month_Ending, oInv.Original_Cost
FROM (
SELECT Item_ID, Month_Ending, Original_Cost
FROM ho_data.dbo.CO_Ho_Inven
UNION ALL
SELECT Item_ID, Month_Ending, Original_Cost
FROM ho_data.dbo.FR_Ho_Inven
) OInv
INNER JOIN (
SELECT UInv.Item_ID, MAX(UInv.Month_ending) AS Month_Ending, MAX(original_cost) AS original_cost
FROM (
SELECT Item_ID, Month_Ending, original_cost
FROM ho_data.dbo.CO_Ho_Inven
UNION ALL
SELECT Item_ID, Month_Ending, original_cost
FROM ho_data.dbo.FR_Ho_Inven
) UInv
GROUP BY UInv.Item_ID
) UINv
ON OInv.Item_ID = UInv.Item_ID
AND OInv.Month_Ending = UInv.Month_Ending
AND OInv.original_cost = UINv.original_cost
Upvotes: 0
Reputation: 60462
Rewrite the Correlated Subqueries using MAX with ROW_NUMBERs:
SELECT GETDATE() AS DateElement, A.SourceStore,
COALESCE(FR.original_cost,CO.original_cost) AS Cost
FROM #TEMPA A
INNER JOIN
(
SELECT *
FROM
(
SELECT original_cost,
item_id,
ROW_NUMBER() OVER (PARTITIOM BY item_id ORDER BY month_ending DESC) AS rn
FROM REMOTEDB.ITEM1
) as dt
WHERE rn = 1
) AS CO
ON CO.item_id = A.ItemNumber
LEFT JOIN
(
SELECT *
FROM
(
SELECT original_cost,
item_id,
ROW_NUMBER() OVER (PARTITIOM BY item_id ORDER BY month_ending DESC) AS rn
FROM REMOTEDB.ITEM2
) as dt
WHERE rn = 1
) as FR
ON FR.item_id = A.ItemNumber
Upvotes: 3
Reputation: 2785
If it is SQL Server 2008 or newer, try this...
;With OrderedItem1 As
(
Select Row_Number() Over (Partition By item_id Order By Month_Ending Desc) As recentOrderID,
item_id,
original_cost
From REMOTEDB.ITEM1
), OrderedItem2 As
(
Select Row_Number() Over (Partition By item_id Order By Month_Ending Desc) As recentOrderID,
item_id,
original_cost
From REMOTEDB.ITEM2
), maxItem1 As
(
Select item_id,
original_cost
From OrderedItem1
Wher recentOrderID = 1
), maxItem2 As
(
Select item_id,
original_cost
From OrderedItem2
Wher recentOrderID = 1
)
Select GetDate() As DateElement,
A.SourceStore,
IsNull(FR.original_cost,CO.original_cost) As Cost
From #TEMPA As A
Join maxItem1 As CO
On CO.item_id = A.ItemNumber
Left Join maxItem2 FR
On FR.item_id = A.ItemNumber
... you mention in the original post that there will always be a record for every item in ITEM1 so your WHERE CO.item_id Is Not Null OR FR.item_id Is Not Null
does nothing (on top of the fact you would filter them out with your inner join).
Upvotes: 1