user1813522
user1813522

Reputation: 43

Optimizing sql query with subselect list in clause

I'm using oracle 11g and trying to optimize a query.

The basic structure of the query is:

SELECT val1, val2, val3,
FROM 
table_name
WHERE
val1 in (subselect statement is here, it selects a list of possible values for 
    val1 from another table) 
and val5>=X and val5<=Y
group by val1
order by val2 desc;

My issue is that when I use a subselect, the cost is 3130. If I fill in the results of the subselect by hand - so, for example

field1 in (1, 2, 3, 4, 5, 6) 

Where (1, 2, 3, 4, 5, 6) is the results of the subselect, which in this case is all possible values of field 1, the cost of the query is 14, and oracle uses an "inlist iterator" for the group by part of the query. The results of the two queries are identical.

My question is how to mimic the behaviour of manually listing the possible values of field1 with a subselect statement. The reason I don't list those values in the query is that the possible values change based on one of the other fields, so the subselect is pulling the possible values of field1 from a 2nd table based on, say, field2.

I have an index of val1, val5, so it isn't doing any full table scans - it does do a range scan in both cases, but in the subselect case the range scan is much more expensive. However it isn't the most expensive part of the subselect query. The most expensive part is the group by, which is a HASH.

Edit - Yes, the query isn't syntactically correct - I didn't want to put up anything too specific. The actual query is fine - the selects use valid group by functions.

The subselect returns 6 values, but it can be anywhere from 1-50 or so based on the other value.

Edit2 - What I ended up doing was 2 separate queries so I could generate the list used in the subselect. I actually tried a similar test in sqlite, and it does the same thing, so this isn't just Oracle.

Upvotes: 3

Views: 6369

Answers (4)

Guillaume Outters
Guillaume Outters

Reputation: 1162

Since Oracle 10g R2 you can use the precompute_subquery hint:

WHERE val1 in (SELECT /*+ precompute_subquery */ …)

See https://stackoverflow.com/a/68492163/1346819 for a great explanation including this hint and all cautions on its use.

Upvotes: 0

DazzaL
DazzaL

Reputation: 21973

what you are seeing is a result of the IN () bieng subject to bind variable peeking. when you have histograms you write a query like "where a = 'a'" oracle will use the histogram to guess how many rows will be returned (same idea with an inlist operator, which iterates for each item and aggregates rows). if no histograms it will make a guess in the form of rows/distinct values. In a subquery oracle doesn't do this (in most cases..there is a unique case where it does).

for example:

SQL> create table test
  2  (val1 number, val2 varchar2(20), val3 number);

Table created.

Elapsed: 00:00:00.02
SQL>
SQL> insert into test select 1, 'aaaaaaaaaa', mod(rownum, 5) from dual connect by level <= 100;

100 rows created.

Elapsed: 00:00:00.01
SQL> insert into test select 2, 'aaaaaaaaaa', mod(rownum, 5) from dual connect by level <= 1000;

1000 rows created.

Elapsed: 00:00:00.02
SQL> insert into test select 3, 'aaaaaaaaaa', mod(rownum, 5) from dual connect by level <= 100;

100 rows created.

Elapsed: 00:00:00.00
SQL> insert into test select 4, 'aaaaaaaaaa', mod(rownum, 5) from dual connect by level <= 100000;

100000 rows created.

so i have a table with 101200 rows. for VAL1 , 100 are "1" 1000 are "2" 100 are "3" and 100k are "4".

now if histograms are gathered (and we do want them in this case)

SQL> exec dbms_stats.gather_table_stats(user , 'test', degree=>4, method_opt=>'for all indexed columns size 4', estimate_percent=>100);

SQL> exec dbms_stats.gather_table_stats(user , 'lookup', degree=>4, method_opt =>'for all indexed columns size 3', estimate_percent=>100);

we see the following:

SQL> explain plan for select * from test where val1 in (1, 2, 3) ;

Explained.

SQL> @explain ""

Plan hash value: 3165434153

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |  1200 | 19200 |    23   (0)| 00:00:01 |
|   1 |  INLIST ITERATOR             |       |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| TEST  |  1200 | 19200 |    23   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | TEST1 |  1200 |       |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

vs

SQL> explain plan for select * from test where val1 in (select id from lookup where str = 'A') ;

Explained.

SQL> @explain ""

Plan hash value: 441162525

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         | 25300 |   518K|   106   (3)| 00:00:02 |
|   1 |  NESTED LOOPS                |         | 25300 |   518K|   106   (3)| 00:00:02 |
|   2 |   TABLE ACCESS BY INDEX ROWID| LOOKUP  |     1 |     5 |     1   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN         | LOOKUP1 |     1 |       |     0   (0)| 00:00:01 |
|*  4 |   TABLE ACCESS FULL          | TEST    | 25300 |   395K|   105   (3)| 00:00:02 |
----------------------------------------------------------------------------------------

where lookup table is

SQL> select * From lookup;

        ID STR
---------- ----------
         1 A
         2 B
         3 C
         4 D

(str is unique indexed and has histograms).

notice a bang on cardinality of 1200 for the inlist and a good plan, but a wildly inaccurate one on the sub query? Oracle hasn't computed histograms on the join condition, instead it has said "look, i dont know what id will be, so ill guess total rows(100k+1000+100+100)/distinct values(4) = 25300 and use that. as such its picked a full table scan.

that's all great, but how to fix it? if you know that this sub query will match a small number of rows (we do). then you have to hint the outer query to try to have it use an index. like:

SQL> explain plan for select /*+ index(t) */ * from test t where val1 in (select id from lookup where str = 'A') ;

Explained.

SQL> @explain

Plan hash value: 702117913

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         | 25300 |   518K|   456   (1)| 00:00:06 |
|   1 |  NESTED LOOPS                |         | 25300 |   518K|   456   (1)| 00:00:06 |
|   2 |   TABLE ACCESS BY INDEX ROWID| LOOKUP  |     1 |     5 |     1   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN         | LOOKUP1 |     1 |       |     0   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| TEST    | 25300 |   395K|   455   (1)| 00:00:06 |
|*  5 |    INDEX RANGE SCAN          | TEST1   | 25300 |       |    61   (2)| 00:00:01 |
----------------------------------------------------------------------------------------

another thing is in my particular case. as val1=4 is most of the table, lets say i have my standard query: select * from test t where val1 in (select id from lookup where str = :B1);

for the possible :B1 inputs. if i know that the valid values passed in are A, B and C (ie not D which maps to id=4) . i can add this trick:

SQL> explain plan for select  * from test t where val1 in (select id from lookup where str = :b1 and id in (1, 2, 3)) ;

Explained.

SQL> @explain ""

Plan hash value: 771376936

--------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                  |   250 |  5250 |    24   (5)| 00:00:01 |
|*  1 |  HASH JOIN                    |                  |   250 |  5250 |    24   (5)| 00:00:01 |
|*  2 |   VIEW                        | index$_join$_002 |     1 |     5 |     1 (100)| 00:00:01 |
|*  3 |    HASH JOIN                  |                  |       |       |            |          |
|*  4 |     INDEX RANGE SCAN          | LOOKUP1          |     1 |     5 |     0   (0)| 00:00:01 |
|   5 |     INLIST ITERATOR           |                  |       |       |            |          |
|*  6 |      INDEX UNIQUE SCAN        | SYS_C002917051   |     1 |     5 |     0   (0)| 00:00:01 |
|   7 |   INLIST ITERATOR             |                  |       |       |            |          |
|   8 |    TABLE ACCESS BY INDEX ROWID| TEST             |  1200 | 19200 |    23   (0)| 00:00:01 |
|*  9 |     INDEX RANGE SCAN          | TEST1            |  1200 |       |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

now notice oracle has got a reasonable card (its pushed the 1,2,3 onto the TEST table and got 1200..not 100% accurate, as i was only filtering on noe of them but ive told oralce CERTAINLY NOT 4!

Upvotes: 6

sufleR
sufleR

Reputation: 2973

I have done some research and I think everything is explained here: oracle docs.
Just look in "How the CBO Evaluates IN-List Iterators" and compare it to "How the CBO Evaluates the IN Operator".

Your query with "field1 in (1, 2, 3, 4, 5, 6)" is matching first case but query with subselect is rewritten by Oracle.

So every query with subselect or join will have similar cost to yours unless you find very tricky way to put return from subquery as parameters.

You can always try to set more memory to sorts.

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1269913

You might be able to fix the statement by adding indexes on the subselect. However, you would have to post the query and execution plan to understand that. By the way, how long does the subselect itself take?

You can try one of the following two versions:

select val1, val2, val3
from table_name join
     (select distinct val from (subselect here)) t
     on table_name.val1 = t.val
where val5>=X and val5<=Y
group by val1, val2, val3
order by val2 desc;

or:

select val1, val2, val3
from table_name
where val5>=X and val5<=Y and
      exists (select 1 from (subselect here) t where t.val = table_name.val1)
group by val1, val2, val3
order by val2 desc;

These are semantically equivalent, and one of them might optimize better.

One other possibility that might work is to do the filtering after the group by. Something like:

select t.*
from (select val1, val2, val3
      from table_name
      where val5>=X and val5<=Y and
      group by val1, val2, val3
     ) t
where val1 in (subselect here)
order by val2 desc;

Upvotes: 1

Related Questions