Seb
Seb

Reputation: 58

SQL - Replacing slow "IN" statement with a JOIN

As per subject - I am trying to replace slow SQL IN statement with an INNER or LEFT JOIN. What I am trying to get rid of:

SELECT 
    sum(VR.Weight) 
FROM 
    verticalresponses VR
WHERE RespondentID IN
(
    SELECT RespondentID FROM verticalstackedresponses VSR WHERE VSR.Question = 'Brand     Aware'
)

The above I tried replacing with

SELECT 
    sum(VR.Weight) 
FROM 
    verticalresponses VR
    LEFT/INNER JOIN verticalstackedresponses VSR ON VSR.RespondentID = VR.RespondentID     AND VSR.Question = 'Brand Aware' 

but unfortunately I'm getting different results. Can anyone see why and if possible advise a solution that will do the job just quicker? Thanks a lot!

Upvotes: 1

Views: 1348

Answers (3)

Gordon Linoff
Gordon Linoff

Reputation: 1270021

The problem may be related to duplicates. Try this:

SELECT sum(VR.Weight) 
FROM verticalresponses VR inner join
     (select distinct RespondentID
      from verticalstackedresponses VSR
      where SR.Question = 'Brand Aware'
     ) vsr
     ON VSR.RespondentID = VR.RespondentID;

You want an inner join because you want to filter the VR table. A left join will keep everything in the first table.

Upvotes: 1

gbn
gbn

Reputation: 432311

  • A JOIN will multiply rows because it's an "Equi join"
  • IN (and EXISTS) will not multiply rows because these are "Semi joins"

Either way, you need suitable indexes, probably

  • verticalresponses, (RespondentID)
  • verticalstackedresponses, (Question, RespondentID)

See Using 'IN' with a sub-query in SQL Statements for more

Upvotes: 1

Ian Kenney
Ian Kenney

Reputation: 6426

The subquery

SELECT RespondentID FROM verticalstackedresponses VSR WHERE VSR.Question = 'Brand     Aware'

could maybe be returning multiple rows for any RespondentID, then you would get different results between join and in versions

Something along the lines of this may give the same results

SELECT 
    sum(VR.Weight) 
FROM 
    verticalresponses VR
JOIN(     SELECT distinct RespondentID FROM verticalstackedresponses 
          WHERE VSR.Question = 'Brand     Aware' 
    ) VSR
    ON  VSR.RespondentID = VR.RespondentID     

Upvotes: 2

Related Questions