Reputation: 1252
I have a SQL 2005 table consisting of around 10million records (dbo.Logs).
I have another table, dbo.Rollup that matches distinct dbo.Logs.URL to a FileId column in a third table, dbo.Files. The dbo.Rollup table forms the basis of various aggregate reports we run at a later stage.
Suffice to say for now, the problem I am having is in populating dbo.Rollup efficiently.
By definition, dbo.Logs has potentially tens of thousands of rows which all share the same URL field value. In our application, one URL can be matched to one dbo.Files.FileId. I.E. There is a many-to-one relationship between dbo.Logs.URL and dbo.Files.FileId (we parse the values of dbo.Logs to determine what the appropriate FileId is for a given URL).
My goal is to significantly reduce the amount of time it takes the first of three stored procedures that run in order to create meaningful statistics from our raw log data.
What I need is a specific example of how to refactor this SQL query to be much more efficient:
sp-Rollup-Step1:
INSERT INTO dbo.Rollup ([FileURL], [FileId])
SELECT
logs.RequestedFile As [URL],
FileId = dbo.fn_GetFileIdFromURL(l.RequestedFile, l.CleanFileName)
FROM
dbo.Logs l (readuncommitted)
WHERE
NOT EXISTS (
SELECT
FileURL
FROM
dbo.Rollup
WHERE
FileUrl = RequestedFile
)
fn_GetFileIdFromURL():
CREATE FUNCTION [dbo].[fn_GetFileIdFromURL]
(
@URL nvarchar(500),
@CleanFileName nvarchar(255)
)
RETURNS uniqueidentifier
AS
BEGIN
DECLARE @id uniqueidentifier
if (exists(select FileURL from dbo.[Rollup] where [FileUrl] = @URL))
begin
-- This URL has been seen before in dbo.Rollup.
-- Retrieve the FileId from the dbo.Rollup table.
set @id = (select top 1 FileId from dbo.[Rollup] where [FileUrl] = @URL)
end
else
begin
-- This is a new URL. Hunt for a matching URL in our list of files,
-- and return a FileId if a match is found.
Set @id = (
SELECT TOP 1
f.FileId
FROM
dbo.[Files] f
INNER JOIN
dbo.[Servers] s on s.[ServerId] = f.[ServerId]
INNER JOIN
dbo.[URLs] u on
u.[ServerId] = f.[ServerId]
WHERE
Left(u.[PrependURLProtocol],4) = left(@URL, 4)
AND @CleanFileName = f.FileName
)
end
return @id
END
Key considerations:
In my own observations, it seems the slowest part of the query by far is in the stored procedure: the "NOT EXISTS" clause (I am not sure at this point whether that continually refreshes the table or not).
I'm looking for a specific solution (with examples using either pseudo-code or by modifying my procedures shown here) - answer will be awarded to those who provide it!
Thanks in advance for any assistance you can provide.
/Richard.
Upvotes: 0
Views: 170
Reputation: 432210
Short answer is you have a CURSOR here. The scalar UDF is run per row of output.
The udf could be 2 LEFT JOINs onto derived tables. A rough outline:
...
COALESCE (F.xxx, L.xxx) --etc
...
FROM
dbo.Logs l (readuncommitted)
LEFT JOIN
(select DISTINCT --added after comment
FileId, FileUrl from dbo.[Rollup]) R ON L.FileUrl = R.FileUrl
LEFT JOIN
(SELECT DISTINCT --added after comment
f.FileId,
FileName ,
left(@PrependURLProtocol, 4) + '%' AS Left4
FROM
dbo.[Files] f
INNER JOIN
dbo.[Servers] s on s.[ServerId] = f.[ServerId]
INNER JOIN
dbo.[URLs] u on
u.[ServerId] = f.[ServerId]
) F ON L.CleanFileName = R.FileName AND L.FileURL LIKE F.Left4
...
I'm also not sure if you need the NOT EXISTS because of how the udf works. If you do, make sure the columns are indexed.
Upvotes: 2
Reputation: 8374
INSERT INTO dbo.Rollup ([FileURL], [FileId])
SELECT
logs.RequestedFile As [URL],
FileId = dbo.fn_GetFileIdFromURL(l.RequestedFile, l.CleanFileName)
FROM dbo.Logs l (readuncommitted) LEFT OUTER JOIN dbo.Rollup
on FileUrl = RequestedFile
WHERE FileUrl IS NULL
The logic here is that if dbo.Rollup does not exist for the given FileUrl, then the left outer join will turn up null. The NOT EXISTS now becomes an IS NULL, which is faster.
Upvotes: 0
Reputation: 3197
I think your hotspot is located here:
Left(u.[PrependURLProtocol],4) = left(@URL, 4)
This will cause the server to do a scan on the url table. You should not use a function on a field in a join clause. try to rewrite that to something like
... where PrependURLProtocol like left(@URL, 4) +"%"
And make sure you have an index on the field.
Upvotes: 1