mehmet sahin
mehmet sahin

Reputation: 812

Oracle Optimizer Awkwardly Doesn't Prefer to Use Index

I'm joining a table with itself but although I expect this operation to use index, it seems it doesn't. There are 1 million records on the table(MY_TABLE) and the query I run is executing on about 10 thousand records.(So it is lower than %1 of whole table.)

Test Case:

explain plan for
SELECT *
  FROM SCHM.MY_TABLE A1, SCHM.MY_TABLE A2
 WHERE     (A1.K_ID = '123abc')
       AND A1.HDT = A2.HDT
       AND A2.C_DATE BETWEEN A1.SYSDATE - 0.0004 
                           AND A1.SYSDATE + 0.0004 
       AND A1.GKID = A2.GKID;


Plan hash value: 1210306805

----------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                  |     3 |    81 |    28   (0)| 00:00:01 |
|*  1 |  FILTER                               |                  |       |       |            |          |
|*  2 |   HASH JOIN                           |                  |     3 |    81 |    28   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| MY_TABLE          |     3 |    45 |     7   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | IX_MY_TABLE_C_DATE |     3 |       |     4   (0)| 00:00:01 |
|   5 |    TABLE ACCESS BY INDEX ROWID BATCHED| MY_TABLE          |    17 |   204 |    21   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN                  | IX_MY_TABLE_K_ID |    17 |       |     4   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(SYSDATE@!+0.00004>=SYSDATE@!-0.00004)
   2 - access("A1"."HDT"="A2"."HDT" AND 
              "A1"."GKID"="A2"."GKID")
   4 - access("A2"."C_DATE">=SYSDATE@!-0.00004 AND 
              "A2"."C_DATE"<=SYSDATE@!+0.00004)
   6 - access("A1"."K_ID"=U'123abc')

In the above statement, it can be seen that the index on C_DATE is used.

However, in the below statement, the index on C_DATE is not used. So, the query runs really slow.

Real Case:

explain plan for
SELECT *
  FROM SCHM.MY_TABLE A1, SCHM.MY_TABLE A2
 WHERE     (A1.K_ID = '123abc')
       AND A1.HDT = A2.HDT
       AND A2.C_DATE BETWEEN A1.C_DATE - 0.0004 
                           AND A1.C_DATE + 0.0004 
       AND A1.GKID = A2.GKID;


Plan hash value: 1063167343

----------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                  |  4187K|   998M|  6549K  (1)| 00:04:16 |
|*  1 |  HASH JOIN                            |                  |  4187K|   998M|  6549K  (1)| 00:04:16 |
|   2 |   JOIN FILTER CREATE                  | :BF0000          |    17 |  2125 |    21   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| MY_TABLE          |    17 |  2125 |    21   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | IX_MY_TABLE_K_ID |    17 |       |     4   (0)| 00:00:01 |
|   5 |   JOIN FILTER USE                     | :BF0000          |  1429M|   166G|  6546K  (1)| 00:04:16 |
|*  6 |    TABLE ACCESS STORAGE FULL          | MY_TABLE          |  1429M|   166G|  6546K  (1)| 00:04:16 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access(A1.HDT=A2.HDT AND 
              A1.GKID=A2.GKID)
       filter(A2.C_DATE>=INTERNAL_FUNCTION(A1.C_DATE)-0.00004 AND 
              A2.C_DATE<=INTERNAL_FUNCTION(A1.C_DATE)+0.00004)
   4 - access(A1.K_ID=U'123abc')
   6 - storage(SYS_OP_BLOOM_FILTER(:BF0000,A2.HDT,A2.HDT))
       filter(SYS_OP_BLOOM_FILTER(:BF0000,A2.HDT,A2.HDT))

If I use the hint /*+index(A2,IX_MY_TABLE_C_DATE )*/, everthing is fine, the index is used and the query runs fast as I want.

The query in the real case cannot be changed because it is created by application.

Index Information:
K_ID, not unique, position 1
HDT, not unique, position 1
C_DATE, not unique, position 1
ID Unique and Primary Key, position 1

What do I have to change in order the query in the real case to use index?

Upvotes: 2

Views: 982

Answers (3)

Connor McDonald
Connor McDonald

Reputation: 11616

Just to add, whenever you have a case where you

  1. have a slow SQL,
  2. have identified a hint that would make it faster
  3. cannot change the SQL because the source is not available

then this is a perfect case for SQL Plan Baselines. These let you lock a "good" plan against an existing SQL without touching the SQL statement itself.

The entire series describing SPM is at the links below, but thelinke to "part 4" walks through an exact example of what you want to achieve.

https://blogs.oracle.com/optimizer/sql-plan-management-part-1-of-4-creating-sql-plan-baselines https://blogs.oracle.com/optimizer/sql-plan-management-part-2-of-4-spm-aware-optimizer https://blogs.oracle.com/optimizer/sql-plan-management-part-3-of-4:-evolving-sql-plan-baselines https://blogs.oracle.com/optimizer/sql-plan-management-part-4-of-4:-user-interfaces-and-other-features

Upvotes: 1

The Impaler
The Impaler

Reputation: 48865

Well, the second query is slower since it's quite different from the first one. It has an extra join between tables:

AND A2.C_DATE BETWEEN A1.C_DATE - 0.0004 
                  AND A1.C_DATE + 0.0004

and on a million rows this takes a toll.

The first query doesn't have this join condition and both tables are:

  • Filtered first. This is fast using indexes: only 3 rows and 17 rows.
  • Joining them second. Joining 3 and 17 rows doesn't take any time.

The second query needs to perform:

  • A huge (hash) join first, that returns probably 100K+ rows.
  • A filtering later.

This is way slower.

I suggest adding the following indexes, and try again:

create index ix_1 (k_id);
create index ix_2 (hdt, gkid, c_date);

Upvotes: 2

Thorsten Kettner
Thorsten Kettner

Reputation: 95101

You have three join criteria (HDT, GKID, C_DATE) and 1 non-join criteria (K_ID) in your self-join. So for me it would seem natural, if the DBMS started with the records matching K_ID and then looked up all matching other records.

For this case I'd suggest the following indexes:

create index idx1 on my_table(k_id, hdt, gkid, c_date);
create index idx2 on my_table(hdt, gkid, c_date);

If there are just few records per k_id, I am sure that Oracle will use the indexes. If there are many, Oracle may still use the second one for a hash join.

Upvotes: 2

Related Questions