broke
broke

Reputation: 8312

How can I improve the performance of this stored procedure?

Okay so I made some changes to a stored procedure that we have, and it now takes 3 hours to run (it used to only take 10 minutes before). I have a temp table called #tCustomersEmail. In it, is a column called OrderDate, which has a lot of null values in it. I want to replace those null values with data from another database on a different server. So here's what I have:

I create another temp table:

Create Table #tSouth
(
CustID char(10),
InvcDate nchar(10)
)

Which I populate with this data:

INSERT INTO #tSouth(CustID, InvcDate)
SELECT DISTINCT
      [CustID],
      max(InvcDate) as InvcDate
  FROM D3.SouthW.dbo.uc_InvoiceLine I
  where EXISTS (SELECT CustomerNumber FROM #tCustomersEmail H WHERE   I.CustID =  H.CustomerNumber) 
  group BY I.CustID

Then I take the data from #tSouth and update the OrderDate in the #tCustomersEmail table, as long as the CustomerNumber matches up, and the OrderDate is null:

UPDATE #tCustomersEmail  
SET OrderDate = InvcDate 
FROM #tCustomersEmail  
INNER JOIN #tSouth ON #tCustomersEmail.CustomerNumber = [#tSouth].CustID
where #tCustomersEmail.OrderDate IS null

Making those changes caused the stored procedure to take FOR-EV-ER (Sandlot reference!)

So what am I doing wrong?

BTW I create indexes on my temp tables after I create them like so:

create clustered index idx_Customers ON #tCustomersEmail(CustomerNumber)
CREATE clustered index idx_CustSouthW ON #tSouth(CustID)

Upvotes: 2

Views: 387

Answers (4)

Ed Harper
Ed Harper

Reputation: 21505

It's difficult to predict the behaviour of complex queries involving tables on a linked server, because the local server has no access to statistics for the remote table, and can end up with a poor query plan because of this - it will work on the assumption that the remote table has either 1 or 100 rows. If this wasn't bad enough, the result of the bad plan can be to pull the entire remote table over the wire into local temp space and work on it there. If the remote table is very large, this can be a major performance overhead.

In might be worth trying to simplify the linked server query to minimise the chances of the entire table being returned over the wire - (as has already been mentioned, you don't need both DISTINCT and GROUP BY)

INSERT INTO #tSouth(CustID, InvcDate)
SELECT [CustID],
      max(InvcDate) as InvcDate
  FROM D3.SouthW.dbo.uc_InvoiceLine I
  group BY I.CustID

leaving the rest of the query unchanged.

However, because of the aggregate this may still bring the whole table back to the local server - you'll need to test to find out. Your best bet may be to encapsulate this logic in a view in the SouthW database, if you're able to create objects in it, then reference that from your SP code.

Upvotes: 2

abatishchev
abatishchev

Reputation: 100366

Maybe use table variable instead of temp table?

declare @temp table
(
    CustID char(10),
    InvcDate nchar(10)
)

insert into @temp
...

That definitely will increase the performance!

Upvotes: 3

t-clausen.dk
t-clausen.dk

Reputation: 44356

Try skipping the #tsouth table and use this query:

UPDATE a
SET OrderDate = (select max(InvcDate) from D3.SouthW.dbo.uc_InvoiceLine I 
                 where a.customernumber = custid)  
FROM #tCustomersEmail a  
WHERE orderdate is null

I don't think the index will help you in this example

Upvotes: 3

JBrooks
JBrooks

Reputation: 10013

Distinct isn't needed if you have a GROUP BY. Given you are going across database I don't like the EXISTS. I would change that part to limit the number of rows at that point. Change to:

INSERT INTO #tSouth(CustID, InvcDate)
SELECT 
      [CustID],
      max(InvcDate) as InvcDate
  FROM D3.SouthW.dbo.uc_InvoiceLine I
  where I.CustID in 
       (SELECT CustomerNumber 
        FROM #tCustomersEmail H 
        WHERE H.OrderDate IS null ) 
  group BY I.CustID

EDIT: Looking closer are you sure uc_InvoiceLine should be used? Looks like there should be a parent table to that one that would had the date and have fewer rows. Also, you can skip the one temp table by doing the update directly:

UPDATE #tCustomersEmail  
SET OrderDate = InvcDate 
FROM #tCustomersEmail  
INNER JOIN (SELECT 
      [CustID],
      max(InvcDate) as InvcDate
  FROM D3.SouthW.dbo.uc_InvoiceLine I
  where I.CustID in 
       (SELECT CustomerNumber 
        FROM #tCustomersEmail H 
        WHERE H.OrderDate IS null ) 
  group BY I.CustID) Invoices
ON #tCustomersEmail.CustomerNumber = Invoices.CustID

Upvotes: 2

Related Questions