Reputation: 19308
I've got a complex SQL where clause that just got more complex because of a requirements change. There are four basic sets of cases, each with a different combination of other factors. It's more readable (in my opinion) to have the four cases as separate branches of the where clause, and to repeat the redundant criteria in each branch. But I don't know how well the database engine will optimize that.
Here's the expression in its redundant form. I've replaced the actual criteria with letters. A is the "branching" criterion, which comes in four forms. All of the expressions are in the form field='value'
, unless otherwise noted.
A1 AND B AND C AND D
OR A2 AND B AND C AND D AND E AND F1 AND G
OR A3 AND A3a AND B AND C AND D AND E AND F1 AND G
OR A4 AND B AND C AND D AND F2
All of the A's except A4 are in the form field in ('value1','value2')
. D is field > 'value'
. G is in the form field not in (subquery)
.
Here's the expression, factored to (I think) its least redundant form.
B AND C AND D AND (
A1
OR (
E AND F1 AND G AND (
A2
OR (A3 AND A3a)
)
)
OR (A4 AND F2)
My question is whether I should factor this expression into its simplest (least redundant) logical form, or whether it's OK to keep it in it's more redundant but also more readable form. The target database is Sybase, but I'd like to know the answer for RDMBSs generally.
Upvotes: 0
Views: 3849
Reputation: 425421
In an RDBMS
world I wouldn't bother of redundancy much, efficiency is more important here.
In your case, I would UNION
all the four queries using A
's as a top condition, like this:
SELECT *
FROM mytable
WHERE A1 AND B AND C
UNION
SELECT *
FROM mytable
WHERE A2 AND B AND C AND D AND E AND F1 AND G
…
I didn't look into Sybase
for more than 7
years, but in all major RDBMS
's UNION
's are more efficient than OR
's.
See this article in my blog for approach to a silimar problem in Oracle
:
UNION
in Oracle
and also this article for comparison of UNION
versus OR
in MySQL
:
UNION
compared to one of OR
in MySQL
I think these approaches will work well for Sybase
too.
You also need to create indexes on columns used in your conditions to benefit from UNION
's
Update:
Since condition G
is a subquery, it may probably happen that it needs a HASH JOIN
to perform fast. HASH JOIN
requires a full scan on all unfiltered values, that's why it may probable be better to filter all values in a single fullscan and then perform a HASH JOIN
:
SELECT *
FROM (
SELECT *
FROM foo
WHERE condition_set_1
UNION
SELECT *
FROM foo
WHERE condition_set_2_but_no_g
…
) q
WHERE G
To make any further judgements, it will be much better to see the query itself, really.
Upvotes: 2
Reputation: 4585
If I were attacking this problem on M$ SQL Server, I would write it the way I wanted and look at the query execution plan. If it (a) ran slow and (b) had a bad execution plan, then I would refactor and document. Not sure what the mechanism is in Sybase for viewing HOW the optimizer executes the query.
Upvotes: 2
Reputation: 115488
I would refactor it. Eventually, duplicated logic will run you into problems. The second example may take a couple seconds longer to understand, but in the big scope of things it's easier to see what is going on as you can quickly look at the entire where clause and begin to determine what affects what.
Upvotes: 0