Christopher
Christopher

Reputation: 1

Optimizing A Slow Complicated Remote SQL Query

I find myself needing to retrieve matches for, on average, ~1.5m rows from a remote database. There are two tables (ITEM1 and ITEM2) that have dated item information. There should always be at least one record for an item in ITEM1, and there may be 0 to many records for the same item in ITEM2. I have to find the latest record from either table, and if it exists in ITEM2, use that information instead of ITEM1. #TEMPA is the table that has the initial ~1.5m ItemNumbers.

Below is the query:

SELECT GETDATE() AS DateElement, A.SourceStore, COALESCE(FR.original_cost,CO.original_cost) AS Cost
FROM #TEMPA A
INNER JOIN REMOTEDB.ITEM1 CO
    ON  CO.item_id = A.ItemNumber 
    AND CO.month_ending >= (SELECT MAX(month_ending) FROM REMOTEDB.ITEM1 CO2 WHERE CO2.item_id = A.ItemNumber) 
LEFT JOIN REMOTEDB.ITEM2 FR
    ON  FR.item_id = A.ItemNumber 
    AND FR.month_ending >= (SELECT MAX(month_ending) FROM REMOTEDB.ITEM2 FR2 WHERE FR2.item_id = A.ItemNumber)
WHERE CO.item_id IS NOT NULL 
    OR FR.item_id IS NOT NULL

There are unique clustered indexes on item_id and month_ending on both ITEM tables. I realize the subqueries are probably a big performance hit, but I can't think of any other way to do it. Each item could potentially have a different max month_ending date. Currently it returns the correct information, but it takes ~2.6 hrs to do so. Any help in optimizing this query to perform better would be appreciated.

Edit: I should mention the query is also being run READ UNCOMMITTED already.

I tried both answer queries using ROW_NUMBER and they both ran in ~20 minutes on the remote server itself. Using my original query it finishes in ~2 minutes. My original query runs in ~17 minutes over linked server. I cancelled the other queries once they went over an hour.

Thoughts?

Answer Queries: http://content.screencast.com/users/CWhittem/folders/Jing/media/ed55352b-9799-4dec-94f0-764e2670884f/2014-07-09_0957.png

Original Query: http://content.screencast.com/users/CWhittem/folders/Jing/media/4991aa7d-a05c-4fb1-afad-52b07f896d5e/2014-07-09_1014.png

Thanks!

Upvotes: 0

Views: 475

Answers (3)

Christopher
Christopher

Reputation: 1

So after much testing and experimentation I have come up with the following that outperforms everything else I have tried:

SELECT DISTINCT oInv.Item_ID, oInv.Month_Ending, oInv.Original_Cost
FROM (
    SELECT Item_ID, Month_Ending, Original_Cost
    FROM ho_data.dbo.CO_Ho_Inven
    UNION ALL
    SELECT Item_ID, Month_Ending, Original_Cost
    FROM ho_data.dbo.FR_Ho_Inven
) OInv
INNER JOIN (
    SELECT UInv.Item_ID, MAX(UInv.Month_ending) AS Month_Ending, MAX(original_cost) AS original_cost
    FROM (
        SELECT Item_ID, Month_Ending, original_cost
        FROM ho_data.dbo.CO_Ho_Inven
        UNION ALL
        SELECT Item_ID, Month_Ending, original_cost
        FROM ho_data.dbo.FR_Ho_Inven
    ) UInv
    GROUP BY UInv.Item_ID
) UINv
ON OInv.Item_ID = UInv.Item_ID
AND OInv.Month_Ending = UInv.Month_Ending
AND OInv.original_cost = UINv.original_cost

Upvotes: 0

dnoeth
dnoeth

Reputation: 60462

Rewrite the Correlated Subqueries using MAX with ROW_NUMBERs:

SELECT GETDATE() AS DateElement, A.SourceStore, 
   COALESCE(FR.original_cost,CO.original_cost) AS Cost
FROM #TEMPA A
INNER JOIN
  (
    SELECT *
    FROM
     (
      SELECT original_cost,
          item_id,
          ROW_NUMBER() OVER (PARTITIOM BY item_id ORDER BY month_ending DESC) AS rn
      FROM REMOTEDB.ITEM1 
     ) as dt
    WHERE rn = 1
  ) AS CO
 ON  CO.item_id = A.ItemNumber 
LEFT JOIN 
  (
    SELECT *
    FROM
     (
      SELECT original_cost,
          item_id,
          ROW_NUMBER() OVER (PARTITIOM BY item_id ORDER BY month_ending DESC) AS rn
      FROM REMOTEDB.ITEM2 
     ) as dt
    WHERE rn = 1
   ) as FR
    ON  FR.item_id = A.ItemNumber

Upvotes: 3

Eric J. Price
Eric J. Price

Reputation: 2785

If it is SQL Server 2008 or newer, try this...

;With   OrderedItem1 As
(
        Select  Row_Number() Over (Partition By item_id Order By Month_Ending Desc) As recentOrderID,
                item_id,
                original_cost
        From    REMOTEDB.ITEM1
),      OrderedItem2 As
(
        Select  Row_Number() Over (Partition By item_id Order By Month_Ending Desc) As recentOrderID,
                item_id,
                original_cost
        From    REMOTEDB.ITEM2
),      maxItem1 As
(
        Select  item_id,
                original_cost
        From    OrderedItem1
        Wher    recentOrderID = 1
),      maxItem2 As
(
        Select  item_id,
                original_cost
        From    OrderedItem2
        Wher    recentOrderID = 1
)
Select  GetDate() As DateElement,
        A.SourceStore,
        IsNull(FR.original_cost,CO.original_cost) As Cost
From    #TEMPA As A
Join    maxItem1 As CO
        On  CO.item_id = A.ItemNumber
Left    Join maxItem2 FR
        On  FR.item_id = A.ItemNumber

... you mention in the original post that there will always be a record for every item in ITEM1 so your WHERE CO.item_id Is Not Null OR FR.item_id Is Not Null does nothing (on top of the fact you would filter them out with your inner join).

Upvotes: 1

Related Questions