keeehlan
keeehlan

Reputation: 8054

How to replace an INNER JOIN with an IN?

I've got a stored procedure that runs a couple queries in a WHILE loop, and these queries in particular take exponentially longer than the rest of the SP, greatly bogging down my performance.

I've read in a few places that using an IN vs. an INNER JOIN can improve performance if you're only pulling columns from the first table, which is exactly what I'm doing.

The problem, however, is that I haven't the first clue on how I can replace my INNER JOIN with an IN clause. I'm getting the notion that the syntax for an IN is very different from an INNER JOIN, but I don't know where to start.

As a footnote, SQL is not my strong suit. All suggestions are greatly appreciated.

Here's one of the queries. They're both pretty much identical in terms of logic.

SELECT @CurrentKey = riv.Key
FROM   ResourceInfo_VW riv
       INNER JOIN (SELECT Track,
                          Code,
                          [Language]
                   FROM   ResourceInfo_VW
                   WHERE  Key = @CurrentKey) AS riv2
         ON riv.Track = riv2.Track
            AND riv.Code = riv2.Code
            AND riv.[Language] = riv2.[Language]
       INNER JOIN UserGroupCourseCatalog_VW ugcc
         ON riv.Key = ugcc.Key
WHERE  riv.[Type] = 'a'
       AND ugcc.UserGroupKey = @UserGroupKey 

Upvotes: 1

Views: 6425

Answers (4)

keeehlan
keeehlan

Reputation: 8054

It took a bit of thinking, but I optimized this performance greatly by removing all of the INNER JOINs from the relevant statements.

Relative to the time it takes to run, this query gets faster and faster based on the number of records being processed.

Here's what I ended up with. You'll notice a few extra variables. These are set at the start of every WHILE iteration.

SELECT  @CurrentTrack = Track,
        @CurrentCode = Code,
        @CurrentLanguage = [Language]
FROM    ResourceInfo_VW riv
WHERE   riv.Key = @CurrentKey

SELECT      riv.Key
FROM        ResourceInfo_VW riv
WHERE       riv.[Type] = 'a'
        AND riv.Track = @CurrentTrack
        AND riv.Code = @CurrentCode
        AND riv.[Language] = @CurrentLanguage
        AND riv.CourseKey IN (SELECT CourseKey 
                              FROM  UserGroupCourseCatalog_VW 
                              WHERE UserGroupKey = @UserGroupKey)

Upvotes: 0

Anup Shah
Anup Shah

Reputation: 1254

Well very first thing I notice is there is a “correlated sub query” which is really not required. You have select in to variable that means only one value get selected at the end even if multiple KEY exists. In that case what key will get selected in variable is up to SQL server as there is no order by clause.

Consider adding Top and order by so you know in case of multiple row satisfying WHERE Conditions, what will be get in to result. Result should be at least consistent every time you run same query.

I would rewrite the query in as below. If proper indexes are there on table the performance should not be bad.

Note that TOP 1 is added because in the variable you cannot store multiple values anyways. So find out which ONE value you want to store. MIN, MAX, TOP1, etc….

Also I don’t see a need for doing SELF JOIN of the table “ResourceInfo_VW”. You can correct me if I am wrong here.

SELECT 
TOP 1 
@Key= riv.KEY
FROM ResourceInfo_VW riv
INNER JOIN UserGroupCourseCatalog_VW ugcc
ON riv.KEY = ugcc.KEY AND riv.[Type] = 'a'
WHERE  ugcc.UserGroupKey = @UserGroupKey

and just for the note, if you try to understand the logic of why this query is written in current way then only you can go to next step, in your case which is re-writing query using different operator.

Upvotes: 2

dana
dana

Reputation: 18155

One option is to create a table variable to store your intermediate results.

Start by inserting the @UserGroupKeys inside a while loop. Then run a single update ... select against your real data tables. This will be the most expensive part of the operation.

The table variable will now have lookup of UserGroupKey to Key and should perform much faster than searching the raw tables.

declare @t table (UserGroupKey nvarchar(255), Key nvarchar(255))
while (...)
begin

    insert into @t(UserGroupKey) values (@UserGroupKey)

end

-- this is the expensive part, but it only happens once
update @t
set Key = riv.KEY
FROM   ResourceInfo_VW riv
       INNER JOIN (SELECT Track,
                          Code,
                          [Language]
                   FROM   ResourceInfo_VW
                   WHERE  KEY = @Key) AS riv2
         ON riv.Track = riv2.Track
            AND riv.Code = riv2.Code
            AND riv.[Language] = riv2.[Language]
       INNER JOIN UserGroupCourseCatalog_VW ugcc
         ON riv.KEY = ugcc.KEY
WHERE  riv.[Type] = 'a'
       AND ugcc.UserGroupKey = @t.UserGroupKey 

while (...)
begin

    select @Key = Key
    from @t
    where @t.UserGroupKey = @UserGroupKey

end

Upvotes: 2

logixologist
logixologist

Reputation: 3834

Yes the formats are very different and they technically do different things: This is how a join is done:

SELECT * FROM TABLEA a JOIN TABLEB on a.commonfield = b.commonfield

This is how you use an IN statement

SELECT * FROM TABLEA WHERE commonfield in (SELECT commonfield from tableb)

Upvotes: 2

Related Questions