MedEc
MedEc

Reputation: 179

How can this query be optimized please?

I have performence issues with the following query :

    SELECT A,B,C,D,E,F
    FROM TABLE1 T1
    INNER JOIN TABLE2 T2 
    ON (((T1.E IS NULL OR T2.E IS NULL) AND  T1.F= T2.F) 
    OR((T1.E IS NOT NULL OR T2.E IS NOT NULL) AND T1.E = T2.E))

More than 30 min to return about 1000 rows

I've tried this :

    SELECT A,B,C,D,E,F
    FROM TABLE1 T1
    INNER JOIN TABLE2 T2 
    ON (((COALESCE(T1.E,-1) = COALESCE(T2.E,-1) 
    AND  ((T1.F= T2.F) 
    OR(T1.E = T2.E))))) 

but gives less results than the first one

Can you help me to find another way to write it in oreder to reduce execution time please ?

I'm using SQL Server 2016

Upvotes: 1

Views: 72

Answers (3)

Joel Coehoorn
Joel Coehoorn

Reputation: 415690

Try this:

SELECT A,B,C,D,E,F
FROM TABLE1 T1
INNER JOIN TABLE2 T2 ON T1.F = T2.F
WHERE T1.E IS NULL OR T2.E IS NULL

UNION

SELECT A,B,C,D,E,F
FROM TABLE1 T1
INNER JOIN TABLE2 T2 ON T1.E = T2.E
WHERE COALESCE(T1.E, T2.E) IS NOT NULL

You might want a UNION ALL, but this should match the original.

This also exposes an interesting quirk in the original logic you may want to reconsider. If the E field from one table is NULL, but not the other, the original code would make checks on both the E and F fields. Which is interesting, because for the E field we know one side is null, but the other is not, so that case can't ever be true... but the logic says to still make the comparison.

It's hard to know what you're doing with the generic names, but there's definitely room to clean up that conditional check. Before worrying about matching to your first results, go back and make sure those first results clearly and accurately state what you want to accomplish, even if that means making the query even slower or longer.

Then, only when you are sure you have a query that both produces accurate results and describes them in an understandable way, you can start looking for different or clever ways to express the same logic that might perform better. But if you don't first take the step of better-defining your logic, you won't be able to validate your optimizations and you'll risk quickly producing incorrect data.

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1269603

Non-equality conditions -- such as OR -- pretty much kill JOIN performance, especially in databases such as SQL Server that do not use indexes in such cases.

I would recommend a two-join approach, but you are going to have to fix the SELECT because it is not clear where the columns come from.

SELECT --A, B, C, D, E, F,
       T1.A,
       COALESCE(T2_1.B, T2_2.B) as B,
       . . . 
FROM TABLE1 T1 INNER JOIN
     TABLE2 T2_1
     ON T2.F = T1.F AND
        (T1.E IS NULL OR T2_1.E IS NULL) LEFT JOIN
     TABLE2 T2_2
     ON T2_2.E = T1.E   -- E cannot be NULL
WHERE T2_1.F IS NOT NULL OR T2_2.E IS NOT NULL;  -- checks for a match for either condition

Then for performance, you want indexes on TABLE2(F, E) and TABLE2(E).

Upvotes: 2

ilyazakharov
ilyazakharov

Reputation: 288

Statement OR might extremely decrease execution time. Try to get rid of it. Maybe something like this would do:

 SELECT A,B,C,D,E,F
 FROM TABLE1 T1
    LEFT JOIN TABLE2 T2 
        ON T1.E = T2.E
    LEFT JOIN TABLE2 T22
        ON T1.F= T22.F
        AND T2.E IS NULL
 WHERE NOT (T2.E IS NULL AND T22.F IS NULL) 

Upvotes: 0

Related Questions