Reputation: 8312
Okay so I made some changes to a stored procedure that we have, and it now takes 3 hours to run (it used to only take 10 minutes before). I have a temp table called #tCustomersEmail
. In it, is a column called OrderDate
, which has a lot of null values in it. I want to replace those null values with data from another database on a different server. So here's what I have:
I create another temp table:
Create Table #tSouth
(
CustID char(10),
InvcDate nchar(10)
)
Which I populate with this data:
INSERT INTO #tSouth(CustID, InvcDate)
SELECT DISTINCT
[CustID],
max(InvcDate) as InvcDate
FROM D3.SouthW.dbo.uc_InvoiceLine I
where EXISTS (SELECT CustomerNumber FROM #tCustomersEmail H WHERE I.CustID = H.CustomerNumber)
group BY I.CustID
Then I take the data from #tSouth
and update the OrderDate
in the #tCustomersEmail
table, as long as the CustomerNumber
matches up, and the OrderDate
is null:
UPDATE #tCustomersEmail
SET OrderDate = InvcDate
FROM #tCustomersEmail
INNER JOIN #tSouth ON #tCustomersEmail.CustomerNumber = [#tSouth].CustID
where #tCustomersEmail.OrderDate IS null
Making those changes caused the stored procedure to take FOR-EV-ER (Sandlot reference!)
So what am I doing wrong?
BTW I create indexes on my temp tables after I create them like so:
create clustered index idx_Customers ON #tCustomersEmail(CustomerNumber)
CREATE clustered index idx_CustSouthW ON #tSouth(CustID)
Upvotes: 2
Views: 387
Reputation: 21505
It's difficult to predict the behaviour of complex queries involving tables on a linked server, because the local server has no access to statistics for the remote table, and can end up with a poor query plan because of this - it will work on the assumption that the remote table has either 1 or 100 rows. If this wasn't bad enough, the result of the bad plan can be to pull the entire remote table over the wire into local temp space and work on it there. If the remote table is very large, this can be a major performance overhead.
In might be worth trying to simplify the linked server query to minimise the chances of the entire table being returned over the wire - (as has already been mentioned, you don't need both DISTINCT
and GROUP BY
)
INSERT INTO #tSouth(CustID, InvcDate)
SELECT [CustID],
max(InvcDate) as InvcDate
FROM D3.SouthW.dbo.uc_InvoiceLine I
group BY I.CustID
leaving the rest of the query unchanged.
However, because of the aggregate this may still bring the whole table back to the local server - you'll need to test to find out. Your best bet may be to encapsulate this logic in a view in the SouthW database, if you're able to create objects in it, then reference that from your SP code.
Upvotes: 2
Reputation: 100366
Maybe use table variable instead of temp table?
declare @temp table
(
CustID char(10),
InvcDate nchar(10)
)
insert into @temp
...
That definitely will increase the performance!
Upvotes: 3
Reputation: 44356
Try skipping the #tsouth table and use this query:
UPDATE a
SET OrderDate = (select max(InvcDate) from D3.SouthW.dbo.uc_InvoiceLine I
where a.customernumber = custid)
FROM #tCustomersEmail a
WHERE orderdate is null
I don't think the index will help you in this example
Upvotes: 3
Reputation: 10013
Distinct isn't needed if you have a GROUP BY. Given you are going across database I don't like the EXISTS. I would change that part to limit the number of rows at that point. Change to:
INSERT INTO #tSouth(CustID, InvcDate)
SELECT
[CustID],
max(InvcDate) as InvcDate
FROM D3.SouthW.dbo.uc_InvoiceLine I
where I.CustID in
(SELECT CustomerNumber
FROM #tCustomersEmail H
WHERE H.OrderDate IS null )
group BY I.CustID
EDIT: Looking closer are you sure uc_InvoiceLine should be used? Looks like there should be a parent table to that one that would had the date and have fewer rows. Also, you can skip the one temp table by doing the update directly:
UPDATE #tCustomersEmail
SET OrderDate = InvcDate
FROM #tCustomersEmail
INNER JOIN (SELECT
[CustID],
max(InvcDate) as InvcDate
FROM D3.SouthW.dbo.uc_InvoiceLine I
where I.CustID in
(SELECT CustomerNumber
FROM #tCustomersEmail H
WHERE H.OrderDate IS null )
group BY I.CustID) Invoices
ON #tCustomersEmail.CustomerNumber = Invoices.CustID
Upvotes: 2