Why is Oracle using full table scan when it should use an index?

Question

I'm doing some experimentation with query plans in Oracle, and I have the following table:

--create a table to use
create table SKEWED_DATA(
  EMP_ID int,
  DEPT int,
  COL2 int,
  CONSTRAINT SKEWED_DATA_PK PRIMARY KEY (EMP_ID)
);
--add an index on dept
create index SKEWED_DATA_INDEX1 on SKEWED_DATA(DEPT);

I then insert 1 million rows of data where 999,999 rows have dept id 1, and 1 row has dept id 99.

Before calculating statistics on the table, Oracle Autotrace shows that when running the following queries, it is using an index scan for both:

select AVG(COL2) from SKEWED_DATA D where DEPT = 1;
select AVG(COL2) from SKEWED_DATA D where DEPT = 99;

It's my understanding that it would be more efficient in this case to use a full table scan for dept id 1, and an index scan for dept id 2.

I then run the following command to generate statistics for the table:

execute DBMS_STATS.GATHER_TABLE_STATS ('HARRY','SKEWED_DATA');

And querying the dba_tab_statistics and user_tab_col_statistics confirms that stats and histograms have been gathered.

Running an autotrace on the following queries now shows full table scan for both!

select AVG(COL2) from SKEWED_DATA D where DEPT = 1;
select AVG(COL2) from SKEWED_DATA D where DEPT = 99;

My question is: why is Oracle using a full table scan for dept id 99 when there is only 1 row with this value?

UPDATE

I tried running the query for dept 99 with a hint to force Oracle to use the index, and whilst Autotrace believes it to be less efficient, the time it takes is 0.001 seconds, compared to 0.03 seconds when using the full table scan, thus proving (I think?) my theory that Oracle should be using the index in this instance.

select /*+ INDEX(D SKEWED_DATA_INDEX1) */ AVG(COL2) from SKEWED_DATA D where DEPT = 99;

user1578653 · Accepted Answer

OK, I think I might have solved it. When I had 999,999 rows with dept 1 and 1 row with dept 99, I inspected the number of histogram buckets by running the following query:

select COLUMN_NAME, HISTOGRAM, NUM_BUCKETS, NUM_DISTINCT from USER_TAB_COL_STATISTICS where  TABLE_NAME = 'SKEWED_DATA';

This showed that there are 2 distinct values but only 1 bucket. If I change the stats gathering to this:

execute DBMS_STATS.GATHER_TABLE_STATS('HARRY','SKEWED_DATA',estimate_percent=>100);

It then correctly comes up with 2 buckets, and the autotrace shows the 'correct' execution plans. So, I guess it's because of the extreme 'skewness' of my data that Oracle cannot generate the correct stats for it unless the estimate_percent is massive.

Interestingly if I have slightly less skewed data (say about 2-3% of all records with a dept id of 99) Oracle does treat it correctly even when I leave the estimate_percent as default.

So, the moral of the story seems to be: if you have ridiculously skewed data like this and Oracle is not using the correct execution plan, try playing around with the estimate_percent parameter.

Why is Oracle using full table scan when it should use an index?

UPDATE

Answers (1)

Related Questions