Comencau
Comencau

Reputation: 1193

Distinct values on indexed column

I have a table with 115 M rows. One of the column is indexed (index called "my_index" on explain plan below) and not nullable. Moreover, this column has just one distinct value so far.

When I do

select distinct my_col from my_table;

, it takes 230 seconds which is very long. Here is the explain plan.

| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT   |          |     1 |     3 | 22064  (90)| 00:03:23 |
|   1 |  SORT UNIQUE NOSORT|          |     1 |     3 | 22064  (90)| 00:03:23 |
|   2 |   INDEX FULL SCAN  | my_index |   115M|   331M|  2363   (2)| 00:00:22 |

Since the column has just one distinct value, why does it take so long ? Why Oracle does not just check index entries and fastly find that there is just one possible value for this column ? On the explain plan above, the index scanning seems to take 22 s but what is this "SORT UNIQUE NOSORT" which takes ages ?

Thank you in advance for your help

Upvotes: 3

Views: 6168

Answers (4)

David Balažic
David Balažic

Reputation: 1473

If there are only a few distinct values of the column, try a compressed index:

create index my_index on my_table (my_col) compress;

This will store each distinct value of the column only once, hopefully reducing the execution time of your query.

As a bonus: use this to see the actual plan used for a query:

select /*+ gather_plan_statistics */ distinct my_col from my_table;
SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR);

The gather_plan_statistics hint will collect more data (it will take longer to execute), but it works without it too. See the documentation of DBMS_XPLAN.DISPLAY_CURSOR for more details.

Upvotes: 1

Jon Heller
Jon Heller

Reputation: 36892

SORT UNIQUE NOSORT is not taking too long. You are looking at the estimates from a bad execution plan that is probably the result of unreasonable optimizer parameters. For example, setting the parameter OPTIMIZER_INDEX_COST_ADJ to 1 instead of the default 100 can produce a similar plan. Most likely your query runs slowly because your database is busy or just slow.

What's wrong with the posted execution plan?

The posted execution plan seems unreasonable. Retrieving data should take much longer than simply throwing out duplicates. And the consumer operation, SORT UNIQUE NOSORT, can start at almost the same time as the producer operation, INDEX FULL SCAN. Normally they should finish at almost the same time. The execution plan in the question shows the optimizer estimates. The screenshot below of an active report shows the actual timelines for a very similar query. All steps are starting and stopping at almost the same time.

enter image description here

Sample setup with reasonable plan

Below is a very similar setup, but with a very plain configuration. Same number of rows read (115 million) and returned (1), and almost the exact same segment size (329MB vs 331 MB). The plan shows almost all of the time being spent on the INDEX FULL SCAN.

drop table test1 purge;
create table test1(a number not null, b number, c number) nologging;
begin
    for i in 1 .. 115 loop
        insert /*+ append */ into test1 select 1, level, level
        from dual connect by level <= 1000000;
        commit;
    end loop;
end;
/
create index test1_idx on test1(a);
begin
    dbms_stats.gather_table_stats(user, 'TEST1');
end;
/
explain plan for select /*+ index(test1) */ distinct a from test1;
select * from table(dbms_xplan.display);

Plan hash value: 77032494

--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |     1 |     3 |   244K  (4)| 00:48:50 |
|   1 |  SORT UNIQUE NOSORT|           |     1 |     3 |   244K  (4)| 00:48:50 |
|   2 |   INDEX FULL SCAN  | TEST1_IDX |   115M|   329M|   237K  (1)| 00:47:30 |
--------------------------------------------------------------------------------

Re-creating a bad plan

--Set optimizer_index_cost_adj to a ridiculously low value.
--This changes the INDEX FULL SCAN estimate from 47 minutes to 29 seconds. 
alter session set optimizer_index_cost_adj = 1;

--Changing the CPUSPEEDNW to 800 will exactly re-create the time estimate
--for SORT UNIQUE NOSORT.  This value is not ridiculous, and it is not
--something you should normally change.  But it does imply your CPUs are
--slow.  My 2+ year-old desktop had an original score of 1720.
begin
    dbms_stats.set_system_stats( 'CPUSPEEDNW', 800);
end;
/

explain plan for select /*+ index(test1) */ distinct a from test1;
select * from table(dbms_xplan.display);

Plan hash value: 77032494

--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |     1 |     3 | 16842  (86)| 00:03:23 |
|   1 |  SORT UNIQUE NOSORT|           |     1 |     3 | 16842  (86)| 00:03:23 |
|   2 |   INDEX FULL SCAN  | TEST1_IDX |   115M|   329M|  2389   (2)| 00:00:29 |
--------------------------------------------------------------------------------

How to investigate

Check the parameters.

select name, value from v$parameter where name like 'optimizer_index%'

NAME                        VALUE
----                        -----
optimizer_index_cost_adj    1
optimizer_index_caching     0

Also check the system statistics.

select * from sys.aux_stats$;

+---------------+------------+-------+------------------+
|     SNAME     |   PNAME    | PVAL1 |      PVAL2       |
+---------------+------------+-------+------------------+
| SYSSTATS_INFO | STATUS     |       | COMPLETED        |
| SYSSTATS_INFO | DSTART     |       | 09-23-2013 17:52 |
| SYSSTATS_INFO | DSTOP      |       | 09-23-2013 17:52 |
| SYSSTATS_INFO | FLAGS      |     1 |                  |
| SYSSTATS_MAIN | CPUSPEEDNW |   800 |                  |
| SYSSTATS_MAIN | IOSEEKTIM  |    10 |                  |
| SYSSTATS_MAIN | IOTFRSPEED |  4096 |                  |
| SYSSTATS_MAIN | SREADTIM   |       |                  |
| SYSSTATS_MAIN | MREADTIM   |       |                  |
| SYSSTATS_MAIN | CPUSPEED   |       |                  |
| SYSSTATS_MAIN | MBRC       |       |                  |
| SYSSTATS_MAIN | MAXTHR     |       |                  |
| SYSSTATS_MAIN | SLAVETHR   |       |                  |
+---------------+------------+-------+------------------+

To find out where the time is really spent, use a tool like the active report.

select dbms_sqltune.report_sql_monitor(sql_id => '5s63uf4au6hcm',
    type => 'active') from dual;

Upvotes: 3

Srini V
Srini V

Reputation: 11365

Re analyse the table.

EXEC dbms_stats.gather_table_stats('owner','table_name',cascade=>true,method_opt=>'FOR ALL INDEXED COLUMNS SIZE ');

Change Index Type

One distinct value out of 115M rows??!! That's what called as low cardinality, not so good for the 'normal' B-Tree index Consider a bitmapped index. (If at all you have B-tree)

Reconstructing Query

If you are sure that no new values will be added to this column then please remove the distinct clause and rather use as Abhijith said.

Upvotes: 3

Abhijith Nagarajan
Abhijith Nagarajan

Reputation: 4030

See the explain plan carefully.

  1. It scans the whole index to know what you are trying to fetch
  2. Then applies distinct function (try to retrieve the unique values). Though you say there is only one unique value, it has to scan the whole index to get the values. Oracle does not know that there is only one distinct value in the index. You can restrict the rownum = 1 to get the quick answer.

Try this to get the quick answer

select my_col from my_table where rownum = 1;

It is highly unfavourable to add an index on a column which has very less distribution. This is bad for the table and overall for the application as well. This just does not make any sense

Upvotes: 0

Related Questions