Richard
Richard

Reputation: 1252

SQL 2005 Query Optimisation

I have a SQL 2005 table consisting of around 10million records (dbo.Logs).

I have another table, dbo.Rollup that matches distinct dbo.Logs.URL to a FileId column in a third table, dbo.Files. The dbo.Rollup table forms the basis of various aggregate reports we run at a later stage.

Suffice to say for now, the problem I am having is in populating dbo.Rollup efficiently.

By definition, dbo.Logs has potentially tens of thousands of rows which all share the same URL field value. In our application, one URL can be matched to one dbo.Files.FileId. I.E. There is a many-to-one relationship between dbo.Logs.URL and dbo.Files.FileId (we parse the values of dbo.Logs to determine what the appropriate FileId is for a given URL).

My goal is to significantly reduce the amount of time it takes the first of three stored procedures that run in order to create meaningful statistics from our raw log data.

What I need is a specific example of how to refactor this SQL query to be much more efficient:

sp-Rollup-Step1:

INSERT INTO dbo.Rollup ([FileURL], [FileId])

SELECT 
 logs.RequestedFile As [URL],
 FileId = dbo.fn_GetFileIdFromURL(l.RequestedFile, l.CleanFileName)

FROM
 dbo.Logs l (readuncommitted) 

WHERE    

NOT EXISTS (
    SELECT
     FileURL
    FROM
     dbo.Rollup
    WHERE
     FileUrl = RequestedFile
)

fn_GetFileIdFromURL():

CREATE FUNCTION [dbo].[fn_GetFileIdFromURL] 
(       
    @URL nvarchar(500),
    @CleanFileName nvarchar(255)
)
RETURNS uniqueidentifier
AS
BEGIN

     DECLARE @id uniqueidentifier

     if (exists(select FileURL from dbo.[Rollup] where [FileUrl] = @URL))
     begin
        -- This URL has been seen before in dbo.Rollup.
            -- Retrieve the FileId from the dbo.Rollup table.
        set @id = (select top 1 FileId from dbo.[Rollup] where [FileUrl] = @URL)        
     end
     else
     begin
        -- This is a new URL. Hunt for a matching URL in our list of files,
            -- and return a FileId if a match is found.
        Set @id = (

            SELECT TOP 1
            f.FileId

            FROM
            dbo.[Files] f

            INNER JOIN
            dbo.[Servers] s on s.[ServerId] = f.[ServerId]

            INNER JOIN
            dbo.[URLs] u on 
                   u.[ServerId] = f.[ServerId]

            WHERE
                Left(u.[PrependURLProtocol],4) = left(@URL, 4)
            AND @CleanFileName = f.FileName  
     )

     end

     return @id

END

Key considerations:

In my own observations, it seems the slowest part of the query by far is in the stored procedure: the "NOT EXISTS" clause (I am not sure at this point whether that continually refreshes the table or not).

I'm looking for a specific solution (with examples using either pseudo-code or by modifying my procedures shown here) - answer will be awarded to those who provide it!

Thanks in advance for any assistance you can provide.

/Richard.

Upvotes: 0

Views: 170

Answers (3)

gbn
gbn

Reputation: 432210

Short answer is you have a CURSOR here. The scalar UDF is run per row of output.

The udf could be 2 LEFT JOINs onto derived tables. A rough outline:

...
COALESCE (F.xxx, L.xxx) --etc
...
FROM
 dbo.Logs l (readuncommitted)
 LEFT JOIN
 (select DISTINCT --added after comment
FileId, FileUrl from dbo.[Rollup]) R ON L.FileUrl = R.FileUrl
 LEFT JOIN
 (SELECT DISTINCT --added after comment
                f.FileId,
FileName ,
left(@PrependURLProtocol, 4) + '%' AS Left4
                FROM
                dbo.[Files] f

                INNER JOIN
                dbo.[Servers] s on s.[ServerId] = f.[ServerId]

                INNER JOIN
                dbo.[URLs] u on 
                           u.[ServerId] = f.[ServerId]
) F ON L.CleanFileName = R.FileName AND L.FileURL LIKE F.Left4
...

I'm also not sure if you need the NOT EXISTS because of how the udf works. If you do, make sure the columns are indexed.

Upvotes: 2

Ryan Michela
Ryan Michela

Reputation: 8374

INSERT INTO dbo.Rollup ([FileURL], [FileId])
SELECT  
 logs.RequestedFile As [URL], 
 FileId = dbo.fn_GetFileIdFromURL(l.RequestedFile, l.CleanFileName)
FROM dbo.Logs l (readuncommitted) LEFT OUTER JOIN dbo.Rollup
 on FileUrl = RequestedFile
WHERE FileUrl IS NULL

The logic here is that if dbo.Rollup does not exist for the given FileUrl, then the left outer join will turn up null. The NOT EXISTS now becomes an IS NULL, which is faster.

Upvotes: 0

Heiko Hatzfeld
Heiko Hatzfeld

Reputation: 3197

I think your hotspot is located here:

Left(u.[PrependURLProtocol],4) = left(@URL, 4)

This will cause the server to do a scan on the url table. You should not use a function on a field in a join clause. try to rewrite that to something like

... where PrependURLProtocol like left(@URL, 4) +"%"

And make sure you have an index on the field.

Upvotes: 1

Related Questions