David
David

Reputation: 73

Oracle - Conditional where clause performance issue

Considering two tables, joined in such a way :

select *
from   table_1
left   join table_2
on     table_1.lnk_id = table_2.lnk_id
;

There is an index on "lnk_id" on table_1, and on table_2. This returns 80 Million rows.

I am using conditional where clause, with variables set by my front-end (APEX):

:all : If=1, should return all rows.

:desired_id : The object I want to return. Can be a null value, in which case I want to return null value rows only.

I coded this first :

select *
from   table_1
left   join table_2
on     table_1.some_id = table_2.some_id

where  (
         case when :all = 1
              then 1
              when :desired_id is null and table_2.desired_id is null
              then 1
              when :desired_id = table_2.desired_id
              then 1
              else 0
              end = 1
       )

Given :all = 0 and :desired_id = some non-null value to select the rows the user desires, I experience terrible performance.

I learned I had to avoid "case" in "where" clause, so adapted to :

where (
        :all = 1
        or (:desired_id is null and table_2.desired_id is null)
        or :desired_id = table_2.desired_id
      )

No chance, this is as slow as the "case" solution.

I realized this :

where  (:desired_id = table_2.desired_id);

--> 0.047s - Super fast

where  (:desired_id = table_2.desired_id or 0 = 1);

--> 0.062s - Super fast

where  (:desired_id = table_2.desired_id or :all = 1);

--> 235s - Super slow

So I can definitely find the desired object across my 80 M rows in no time, with the where( ... = ... or ... = 1) construction : the optimizer must take the wrong decision when I use :all.

Would anyone be able to guide ?

I would rather avoid the build-a-dynamic-query solution if possible as it makes it a lot more complex to implement and manage I believe and this really sounds like ... it should work with plain SQL.

-- Edit to add the plans --

Good plan

select *
from   table_2
left   join table_1
on     table_2.lnk_id = table_1.lnk_id
where  (:desired_id = table_1.desired_id or 0 = 1);


Plan hash value: 1995399472

--------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                  |   129 | 42183 |    45   (0)| 00:00:01 |       |       |
|   1 |  NESTED LOOPS                         |                  |   129 | 42183 |    45   (0)| 00:00:01 |       |       |
|   2 |   NESTED LOOPS                        |                  |   138 | 42183 |    45   (0)| 00:00:01 |       |       |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| TABLE_1          |     3 |   435 |     7   (0)| 00:00:01 |       |       |
|*  4 |     INDEX RANGE SCAN                  | TABLE_1_I1       |     3 |       |     3   (0)| 00:00:01 |       |       |
|*  5 |    INDEX RANGE SCAN                   | TABLE_2_I2       |    46 |       |     3   (0)| 00:00:01 |       |       |
|   6 |   TABLE ACCESS BY GLOBAL INDEX ROWID  | TABLE_2          |    40 |  7280 |    21   (0)| 00:00:01 | ROWID | ROWID |
--------------------------------------------------------------------------------------------------------------------------



Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("TABLE_1"."DESIRED_ID"=:DESIRED_ID)
   5 - access("TABLE_2"."LNK_ID"="TABLE_1"."LNK_ID")

Wrong plan

explain plan for
select *
from   table_2
left   join table_1
on     table_2.lnk_id = table_1.lnk_id
where    (:desired_id = table_1.desired_id or :p3070100_all = 1);

Plan hash value: 94704160

----------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name          | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |               |    79M|    24G|       |  1441K  (1)| 00:00:57 |       |       |
|*  1 |  FILTER                |               |       |       |       |            |          |       |       |
|*  2 |   HASH JOIN RIGHT OUTER|               |    79M|    24G|   484M|  1441K  (1)| 00:00:57 |       |       |
|   3 |    TABLE ACCESS FULL   | TABLE_1       |  3238K|   447M|       | 19152   (1)| 00:00:01 |       |       |
|   4 |    PARTITION RANGE ALL |               |    79M|    13G|       |   668K  (1)| 00:00:27 |     1 |1048575|
|   5 |     TABLE ACCESS FULL  | TABLE_2       |    79M|    13G|       |   668K  (1)| 00:00:27 |     1 |1048575|
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("TABLE_1"."DESIRED_ID"=:DESIRED_ID OR TO_NUMBER(:P3070100_ALL)=1)
   2 - access("TABLE_2"."LNK_ID"="TABLE_1"."LNK_ID"(+))   

Thank you, David

Upvotes: 2

Views: 1822

Answers (3)

robertus
robertus

Reputation: 356

When you add or to your where clause and the part of that condition is parameter or function, database cannot just simply use an index. It's because there is "another option" and cost based optimiser (CBO) often choose to "FULL TABLE SCAN". You can make your query more simple for CBO - for example:

  1. at application level choose to create different where clause based on :all value,

  2. use some tricks making condition look simple - like instead of using :all parameter use only :desired_id and for getting all results, just pass "null" as value, then you can do something like that:

    where table_2.desired_id = nvl(:desired_id, table_2.desired_id)
    

If there is index on table_2.desired_id, CBO should choose "range scan" or "unique scan" (for unique index).

You should always generate explain plan for your queries and look for "full scans", "nested loops" and "cartesian joins" with large tables - those are things you should avoid.

Update (2019-02-01)

There is 3rd option, when you would like to have "all in one query", so without any additional logic at application level (choosing between 2 queries) or using dynamic SQL. There is an option to make 2 queries in one with union all and write them in a way that always only one part will be run by database.

Below is improved version of query proposed by JPG:

select *
from table_2
left join table_1 on table_2.lnk_id = table_1.lnk_id
where nvl(:all,2) != 1
  and :desired_id = table_2.desired_id
union all
select *
from table_2
left join table_1 on table_2.lnk_id = table_1.lnk_id
where :desired_id = table_2.desired_id
  and :all = 1
;

Take into consideration that explain plain could "show" more work for this, than for the other plans, but database should in run-time resolved one of condition as "always false" and query only for one part.

-- Second Update --

Ok, I've now read one of your comments where you've written:

"All" returns every single row, "None" returns all rows with null values for that column, "specific value" returns rows matching that specific value...

Let's say all can take 3 values:

  • 0 - which means specific row
  • 1 - which means all rows
  • 2 - which means all rows with NULL in table_2.desired_id

Then the query will be:

-- Specific row
select *
from   table_2
left   join table_1
on     table_2.lnk_id = table_1.lnk_id
where  :all = 0
  and  :desired_id = table_2.desired_id
union all
-- All rows with null
select *
from   table_2
left   join table_1
on     table_2.lnk_id = table_1.lnk_id
where  :all = 2
  and  table_2.desired_id is NULL
union all
-- All rows
select *
from   table_2
left   join table_1
on     table_2.lnk_id = table_1.lnk_id
where  :desired_id = table_2.desired_id
  and    :all = 1
;

But you should know that simple index doesn't work with NULLs. So if you have index on table_2.desired_id:

create index idx_table_2_desired_id on table_2(desired_id);

It will not work, but complex one might do, for ex.

create index idx_table_2_desired_id on table_2(desired_id, 1);

will allow database to search for NULLs in desired_id in such index.

Upvotes: 1

JPG
JPG

Reputation: 11

As robertus wrote, you should avoid using "OR" in a query which is working on indexed columns I suggest to replace the following query with "OR"

select *
from   table_2
left   join table_1
on     table_2.lnk_id = table_1.lnk_id
where  (:desired_id = table_2.desired_id or :all = 1);

by a much more powerful solution in terms of cost.

select *
from   table_2
left   join table_1
on     table_2.lnk_id = table_1.lnk_id
where  :desired_id = table_2.desired_id
  union
select *
from   table_2
left   join table_1
on     table_2.lnk_id = table_1.lnk_id
where  :desired_id <> table_2.desired_id
and    :all = 1
;

Upvotes: 1

krokodilko
krokodilko

Reputation: 36107

A glance at the wrong plan explains the cause of the problem, which is here:

1 - filter("TABLE_1"."DESIRED_ID"=:DESIRED_ID OR TO_NUMBER(:P3070100_ALL)=1) 

The optimizer is unable to use TABLE_1_I1 index because of this OR condition.


Do not use this bind variable TO_NUMBER(:P3070100_ALL)=1 in the query at all, use dynamic SQL and two version of the query depending on the value of :P3070100 instead.

If :P3070100 <> 1 use this query, which will use the index and will be fast:

select *
from   table_2
left   join table_1
on     table_2.lnk_id = table_1.lnk_id
where    :desired_id = table_1.desired_id ;

and when :P3070100 = 1 use this query, which will be slow (because it joins all rows from both tables):

select *
from   table_2
left   join table_1
on     table_2.lnk_id = table_1.lnk_id

Upvotes: 0

Related Questions