John M Gant
John M Gant

Reputation: 19308

Complex SQL where clause: whether to factor logic

I've got a complex SQL where clause that just got more complex because of a requirements change. There are four basic sets of cases, each with a different combination of other factors. It's more readable (in my opinion) to have the four cases as separate branches of the where clause, and to repeat the redundant criteria in each branch. But I don't know how well the database engine will optimize that.

Here's the expression in its redundant form. I've replaced the actual criteria with letters. A is the "branching" criterion, which comes in four forms. All of the expressions are in the form field='value', unless otherwise noted.

A1 AND B AND C AND D
OR A2 AND B AND C AND D AND E AND F1 AND G
OR A3 AND A3a AND B AND C AND D AND E AND F1 AND G
OR A4 AND B AND C AND D AND F2

All of the A's except A4 are in the form field in ('value1','value2'). D is field > 'value'. G is in the form field not in (subquery).

Here's the expression, factored to (I think) its least redundant form.

B AND C AND D AND (
    A1
    OR (
        E AND F1 AND G AND (
            A2
            OR (A3 AND A3a)
        )
    )
    OR (A4 AND F2)

My question is whether I should factor this expression into its simplest (least redundant) logical form, or whether it's OK to keep it in it's more redundant but also more readable form. The target database is Sybase, but I'd like to know the answer for RDMBSs generally.

Upvotes: 0

Views: 3849

Answers (3)

Quassnoi
Quassnoi

Reputation: 425421

In an RDBMS world I wouldn't bother of redundancy much, efficiency is more important here.

In your case, I would UNION all the four queries using A's as a top condition, like this:

SELECT  *
FROM    mytable
WHERE   A1 AND B AND C
UNION
SELECT  *
FROM    mytable
WHERE   A2 AND B AND C AND D AND E AND F1 AND G
…

I didn't look into Sybase for more than 7 years, but in all major RDBMS's UNION's are more efficient than OR's.

See this article in my blog for approach to a silimar problem in Oracle:

and also this article for comparison of UNION versus OR in MySQL:

I think these approaches will work well for Sybase too.

You also need to create indexes on columns used in your conditions to benefit from UNION's

Update:

Since condition G is a subquery, it may probably happen that it needs a HASH JOIN to perform fast. HASH JOIN requires a full scan on all unfiltered values, that's why it may probable be better to filter all values in a single fullscan and then perform a HASH JOIN:

SELECT  *
FROM    (
        SELECT  *
        FROM    foo
        WHERE   condition_set_1
        UNION
        SELECT  *
        FROM    foo
        WHERE   condition_set_2_but_no_g
        …
        ) q
WHERE   G

To make any further judgements, it will be much better to see the query itself, really.

Upvotes: 2

Bill
Bill

Reputation: 4585

If I were attacking this problem on M$ SQL Server, I would write it the way I wanted and look at the query execution plan. If it (a) ran slow and (b) had a bad execution plan, then I would refactor and document. Not sure what the mechanism is in Sybase for viewing HOW the optimizer executes the query.

Upvotes: 2

kemiller2002
kemiller2002

Reputation: 115488

I would refactor it. Eventually, duplicated logic will run you into problems. The second example may take a couple seconds longer to understand, but in the big scope of things it's easier to see what is going on as you can quickly look at the entire where clause and begin to determine what affects what.

Upvotes: 0

Related Questions