dsb
dsb

Reputation: 2517

Complicated WHERE clause vs. merging results with UNION, which is better?

I have some heavy query to make. I can perform it in one of either two ways:

  1. Combined and complicated WHERE clause:

    SELECT *  
    FROM Talks d  
    WHERE (d.UserId IN (SELECT SomeGuyUserId FROM Network WHERE MainUserId = @ViewerUserId AND IsX = 0 AND IsY = 0)  
    AND DATEDIFF(DAY,d.Started,GETDATE()) < 3  
    AND d.Status <= 100)  
    OR   
    (d.UserId IN (SELECT SomeGuyUserId FROM Network WHERE MainUserId = @ViewerUserId AND IsX = 1 AND IsY = 0)   
    AND DATEDIFF(DAY,d.Started,GETDATE()) < 3   
    AND d.Status <= 110)   
    ORDER BY d.UserId;   
    
  2. Two different SELECTS with UNION:

    SELECT *   
    FROM Talks d  
    WHERE (d.UserId IN (SELECT SomeGuyUserId FROM Network WHERE MainUserId = @ViewerUserId AND IsX = 0 AND IsY = 0)  
    AND DATEDIFF(DAY,d.Started,GETDATE()) < 3  
    AND d.Status <= 100)  
    UNION  
    SELECT *   
    FROM Talks d  
    (d.UserId IN (SELECT SomeGuyUserId FROM Network WHERE MainUserId = @ViewerUserId AND IsX = 1 AND IsY = 0)   
    AND DATEDIFF(DAY,d.Started,GETDATE()) < 3   
    AND d.Status <= 110)  
    

I have three questions:

  1. Which is the better way to use? (and if you can please elaborate why)
  2. Is there a better way to perform this
  3. In the second case, how can I sort (ORDER BY) the unified result (like in the first option).

PS. all tables are pretty big (> 10M records in each) and this query should be used extensively by some website.

Upvotes: 0

Views: 110

Answers (3)

Ruud Helderman
Ruud Helderman

Reputation: 11018

In many cases, SQL Server is terrible at optimizing OR, so it is very well possible you are better off with UNION. But the only way to be sure, is to test both queries against a representatively big database. Use SQL Profiler to compare the metrics.

An alternative is to refactor the WHERE clause of approach 1; try to move the OR deeper into the expression tree, or if possible, totally eliminate it. Example:

SELECT *
FROM Talks d
INNER JOIN Network n ON n.SomeGuyUserId = d.UserId
WHERE n.MainUserId = @ViewerUserId
AND n.IsX IN (0, 1)
AND n.IsY = 0
AND DATEDIFF(DAY, d.Started, GETDATE()) < 3
AND d.Status <= CASE n.IsX WHEN 0 THEN 100 WHEN 1 THEN 110 END
ORDER BY d.UserId;

Please note this query may result in duplicate records because I replaced the IN <subquery> with a join. This happens when duplicate combinations of MainUserId and SomeGuyUserId exist in table Network. If possible, use a unique constraint to avoid that situation.

Upvotes: 1

Ingo
Ingo

Reputation: 1815

Basically your first sql-query is not very complex. Compared to all the queries i've seen over the years it is in fact a simple one... i would just extract the DATEDIFF and go with that ...

SELECT *  
FROM Talks d  
WHERE DATEDIFF(DAY,d.Started,GETDATE()) < 3  
and (    (d.UserId IN (SELECT SomeGuyUserId FROM Network WHERE MainUserId = @ViewerUserId AND IsX = 0 AND IsY = 0) AND d.Status <= 100)  
     OR   
         (d.UserId IN (SELECT SomeGuyUserId FROM Network WHERE MainUserId = @ViewerUserId AND IsX = 1 AND IsY = 0) AND d.Status <= 110)   
    )
ORDER BY d.UserId;   

Upvotes: 0

Erich Kitzmueller
Erich Kitzmueller

Reputation: 36977

You have to look at the excution plans of both queries to decide that. Two different things could happen:

  • The first query results one full table scan on Talks, while the second query results in two full table scans (one for each part of the union). In that case, the first query is probably faster.

  • The first query results one full table scan on Talks, while the second query results in two index range scans. In that case, the second query is probably faster.

Upvotes: 0

Related Questions