Oracle is not using the Indexes

Question

I have a very large table in oracle 11g that has a very simple index in a char field (that is normally Y or N) If I just execute the queue as bellow it takes around 10s to return

select QueueId, QueueSiteId, QueueData from queue where QueueProcessed = 'N'

However if I force it to use the index I create it takes 80ms

select /*+ INDEX(avaqueue QUEUEPROCESSED_IDX) */ QueueId, QueueSiteId, QueueData  
  from queue where QueueProcessed = 'N'

Also if I run under the explain plan for as bellow:

explain plan for select QueueId, QueueSiteId, QueueData 
  from queue where QueueProcessed = 'N'

and

explain plan for select /*+ INDEX(avaqueue QUEUEPROCESSED_IDX) */ 
  QueueId, QueueSiteId, QueueData 
  from queue where QueueProcessed = 'N'

For the frist plan I got:

------------------------------------------------------------------------------

Plan hash value: 803924726

------------------------------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          |   691K|   128M| 12643   (1)| 00:02:32 |
|*  1 |  TABLE ACCESS FULL| AVAQUEUE |   691K|   128M| 12643   (1)| 00:02:32 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("QUEUEPROCESSED"='N')

For the second pla I got:

Plan hash value: 2012309891

--------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                    |   691K|   128M| 24386   (1)| 00:04:53 |
|   1 |  TABLE ACCESS BY INDEX ROWID| AVAQUEUE           |   691K|   128M| 24386   (1)| 00:04:53 |
|*  2 |   INDEX RANGE SCAN          | QUEUEPROCESSED_IDX |   691K|       |  1297   (1)| 00:00:16 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("QUEUEPROCESSED"='N')

------------------------------------------------------------------------------

What proves that if I don't explicit tell oracle to use the index it does not use it, my question is why is oracle not using this index? Oracle is normally smart enough to make decisions 10 times better than me, that is the first time I actually have to force oracle to use a index and I am not very comfortable with it.

Does anyone have a good explanation for oracle decision to not use the index in this very explicit case?

Jon Heller · Accepted Answer

The QueueProcessed column is probably missing a histogram so Oracle does not know the data is skewed.

If Oracle does not know the data is skewed it will assume the equality predicate, QueueProcessed = 'N', returns DBA_TABLES.NUM_ROWS / DBA_TAB_COLUMNS.NUM_DISTINCT. The optimizer thinks the query returns half the rows in the table. Based on the 80ms return time the real number of rows returned is small.

Index range scans generally only work well when they select a small percentage of the rows. Index range scans read from a data structure one block at a time. And if the data is randomly distributed, it may need to read every block of data from the table anyway. For those reasons, if the query accesses a large portion of the table, it is more efficient to use a multi-block full table scan.

The bad cardinality estimate from the skewed data causes Oracle to think a full table scan is better. Creating a histogram will fix the issue.

Sample schema

Create a table, fill it with skewed data, and gather statistics the first time.

drop table queue;

create table queue(
    queueid number,
    queuesiteid number,
    queuedata varchar2(4000),
    queueprocessed varchar2(1)
);
create index QUEUEPROCESSED_IDX on queue(queueprocessed);

--Skewed data - only 100 of the 100000 rows are set to N.
insert into queue
select level, level, level, decode(mod(level, 1000), 0, 'N', 'Y')
from dual connect by level <= 100000;

begin
    dbms_stats.gather_table_stats(user, 'QUEUE');
end;
/

The first execution will have the problem.

In this case the default statistics settings do not gather histograms the first time. The plan shows a full table scan and estimates Rows=50000, exactly half.

explain plan for
select QueueId, QueueSiteId, QueueData 
from queue where QueueProcessed = 'N';

select * from table(dbms_xplan.display);

Plan hash value: 1157425618

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       | 50000 |   878K|   103   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| QUEUE | 50000 |   878K|   103   (1)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("QUEUEPROCESSED"='N')

Create a histogram

The default statistics settings are usually sufficient. Histogram may not be collected for several reasons. They may be manually disabled - check for the tasks, jobs or preferences set by the DBA.

Also, histograms are only automatically collected on columns that are both skewed and used. Gathering histograms can take time, there's no need to create the histogram on a column that is never used in a relevant predicate. Oracle tracks when a column is used and could benefit from a histogram, although that data is lost if the table is dropped.

Running a sample query and re-gathering statistics will make the histogram appear:

select QueueId, QueueSiteId, QueueData 
from queue where QueueProcessed = 'N';

begin
    dbms_stats.gather_table_stats(user, 'QUEUE');
end;
/

Now the Rows=100 and the Index is used.

explain plan for
select QueueId, QueueSiteId, QueueData 
from queue where QueueProcessed = 'N';

select * from table(dbms_xplan.display);

Plan hash value: 2630796144

----------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                    |   100 |  1800 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| QUEUE              |   100 |  1800 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | QUEUEPROCESSED_IDX |   100 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("QUEUEPROCESSED"='N')

Here's the histogram:

select column_name, histogram
from dba_tab_columns
where table_name = 'QUEUE'
order by column_name;

COLUMN_NAME      HISTOGRAM
-----------      ---------
QUEUEDATA        NONE
QUEUEID          NONE
QUEUEPROCESSED   FREQUENCY
QUEUESITEID      NONE

Create the histogram

Try to determine why the histogram was missing. Check that statistics are gathered with the defaults, there are no weird column or table preferences, and that table is not constantly dropped and re-loaded.

If you cannot rely on the default statistics job for your process you can manually gather histograms with the method_opt parameter like this:

begin
    dbms_stats.gather_table_stats(user, 'QUEUE', method_opt=>'for columns size 254 queueprocessed');
end;
/

Oracle is not using the Indexes

Answers (2)

Related Questions