Reputation: 995

Strange speed changes with sql query

I'm having trouble getting the correct query (oracle) for what I'm looking for. Essentially what I want is:

SELECT  count(ck.id)
FROM    claim_key ck
WHERE   (ck.dte_of_srvce > (SYSDATE - INTERVAL '30' DAY))
     AND ck.clm_type = 5
     AND ck.prgrm_id = 1

Explain:

| Id  | Operation                    |  Name            | Rows  | Bytes | Cost |
|   0 | SELECT STATEMENT             |                  |     1 |    14 |  3080|
|   1 |  SORT AGGREGATE              |                  |     1 |    14 |      |
|   2 |   TABLE ACCESS BY INDEX ROWID| CLAIM_KEY        |  6531 | 91434 |  3080|
|   3 |    INDEX SKIP SCAN           | I_CLAIM_KEY_001  |  1306K|       |  2813|

This query gets me what I want (the result on average being 20), but takes 10 minutes to run.

The following query is not quite complete, but runs much faster:

SELECT  count(ck.id)
FROM    claim_key ck
WHERE   (ck.dte_of_srvce > (SYSDATE - INTERVAL '30' DAY))
     AND ck.clm_type = 5

Explain:

| Id  | Operation            |  Name       | Rows  | Bytes | Cost  |
|   0 | SELECT STATEMENT     |             |     1 |    11 |  9195 |
|   1 |  SORT AGGREGATE      |             |     1 |    11 |       |
|   2 |   TABLE ACCESS FULL  | CLAIM_KEY   | 19592 |   210K|  9195 |

This returns about 20 as well, though this is just by fluke and I can't rely on it, I need to include the prgrm_id. The thing is that it only take 20 seconds.

The following query is not what I'm looking for but gives an idea of the performance:

SELECT  count(ck.id)
FROM    claim_key ck
WHERE   (ck.dte_of_srvce > (SYSDATE - INTERVAL '30' DAY))

| Id  | Operation             |  Name            | Rows  | Bytes | Cost  |
|   0 | SELECT STATEMENT      |                  |     1 |     8 |     4 |
|   1 |  SORT AGGREGATE       |                  |     1 |     8 |       |
|   2 |   INDEX FAST FULL SCAN| I_CLAIM_KEY_002  |   195K|  1530K|     4 |

This also takes 20 seconds, but it returns on average 700 records. The table claim_key is about 25 million rows.

There are multiple indexes on this table. They are:

IX_CLAIM_KEY_CREATED: CREATED_ON
I_CLAIM_KEY_001: CLNC_STE_ID, PRVDR_ID, PRGRM_ID, UPDATED_ON
I_CLAIM_KEY_002: SRCE_ID, PRVDR_ID, CLNC_ID, DTE_OF_SRVCE, PRGRM_ID
I_CLAIM_KEY_003: CLNT_ID, DTE_OF_SRVCE
I_CLAIM_KEY_004: TRNSMSN_ID, CLM_STTS
I_CLAIM_KEY_005: UPDATED_ON
I_CLAIM_KEY_006: PRVDR_ID, CMN_SRCE_ID
PK_CLAIM_ID: ID

What I want to know is why is adding the prgrm_id slowing it down so much? I would have expected it to be pretty quick since it would only have to search through the 700 rows specified by (ck.dte_of_srvce > (SYSDATE - INTERVAL '30' DAY)). Is that an incorrect assumption?

Edit

Using the hint /*+ FULL(ck) */ on the first query, its execution time drops and it generates the following plan.

| Id  | Operation            |  Name       | Rows  | Bytes | Cost  |
|   0 | SELECT STATEMENT     |             |     1 |    14 |  9195 |
|   1 |  SORT AGGREGATE      |             |     1 |    14 |       |
|   2 |   TABLE ACCESS FULL  | CLAIM_KEY   |  6531 | 91434 |  9195 |

Upvotes: 1

Answers (5)

Patrick Marchand

Reputation: 3445

To better understand what's going on, try this:

explain plan set statement_id = 'query1' for
SELECT  count(ck.id)
FROM    claim_key ck
WHERE   (ck.dte_of_srvce > (SYSDATE - INTERVAL '30' DAY))
     AND ck.clm_type = 5
     AND ck.prgrm_id = 1;

and then:

select *
from table(dbms_xplan.display(statement_id=>'query1'));

I'm guessing you'll see a line indicating TABLE ACCESS FULL on claim_key.

Then try:

explain plan set statement_id = 'query2' for
SELECT  count(ck.id)
FROM    claim_key ck
WHERE   (ck.dte_of_srvce > (SYSDATE - INTERVAL '30' DAY))
     AND ck.clm_type = 5;

select *
from table(dbms_xplan.display(statement_id=>'query2'));

and check to see what index it (presumably) using. That should give you an idea of what the database is doing which helps to figure out why it's doing it.

Ok, given your explain plans, it's a classic example of "indexes are not always good, tables scans are not always bad".

The INDEX SKIP SCAN is where the database can try to use an index even though the leading column of the index is not even used. Basically if your index looked like this (overly simplified):

COL1   COL2   ROWID
A      X      1        <--
A      Y      2
A      Z      3
B      X      4        <--
B      Y      5
B      Z      6

and your condition was WHERE col2 = 'X' the index skip scan says look through each combination in COL1 for where col2 = 'X'. It "skips" the values in col1 once it's found a match (e.g. col1 = A, col2 = X) down to where the value changes (col1 = B, then col1 = C, etc.) and looks for more matches.

The catch is that indexes (generally!) work like this: 1) find the next rowid in the index where the value was found 2) go to the table block with that rowid (TABLE ACCESS BY INDEX ROWID) 3) repeat until no more matches are found.

(For the skip scan, it would also incur the cost of finding out where the next change of value is for the leading columns.)

This is all well and good for a small number of rows, but suffers from the law of diminishing returns; it's not that great when you've got a large number of rows. That's because it has to read an index block, then a table block, then an index block, a table block (even if the table block was previously read.)

The full table scan just "plows" through the data thanks in part to...multiblock reads. The database can read many blocks from disk in a single read and doesn't read the same block more than once.

The INDEX FAST FULL SCAN is basically treating the I_CLAIM_KEY_002 as a table. All of what you need in the query can be answered by the index alone; no TABLE ACCESS is required. (I'm guessing I_CLAIM_KEY_002 is defined as clnt_id, dte_of_srvce and either clnt_id or dte_of_srvce is not nullable. Since ck.id should be a not null attribute, a count on ck.id is the same as a count on ck.clnt_id.)

So as for your initial query, unless you want to rejig your indexes, try this:

SELECT  /*+ FULL(ck) */ count(ck.id)
FROM    claim_key ck
WHERE   (ck.dte_of_srvce > (SYSDATE - INTERVAL '30' DAY))
     AND ck.clm_type = 5
     AND ck.prgrm_id = 1

which will force a full table scan on claim_key (ck) and you could see similar performance as the other two. (Check that this is the case first prefixing the query with "explain plan set statement_id = 'query_hint' for" and running the dbms_xplan query before you run it.)

(Now you'll ask "do I want to put in hints like that all the time"? Please don't. This is for a test only. This is just to check to see if a FTS is better than the INDEX SKIP SCAN. If it is, then you need to find out why. :)

Anyways...I hope that made snese..I mean sense.

Upvotes: 4

Brian Hooper

Reputation: 22084

What happens if you try something like...

SELECT COUNT(*)
    FROM (SELECT ck.id,
                 ck.prgrm_id AS prgrm_id
              FROM claim_key ck
              WHERE (ck.dte_of_srvce > (SYSDATE - INTERVAL '30' DAY)) AND
                    ck.clm_type = 5) AS sq
    WHERE sq.prgrm_id = 1;

I can't try this sort of thing at home so it may be no good, but it may help.

Upvotes: 0

DCookie

Reputation: 43533

Perhaps stating the obvious here, but are these results repeatable, or did you try this once, in the order specified in your question? If so then block caching could explain the differences.

Upvotes: 0

Xavier Combelle

Reputation: 11235

Maybe it's because cInt_id is only on your third index. And else it would use the second one

Upvotes: 0

Bob Jarvis - Слава Україні

Reputation: 50067

Looks to me like none of the indexes shown in your post are very useful. I'd expect it to do a full table scan no matter what.

If you can, add an index on

DTE_OF_SRVCE, CLM_TYPE, PRGRM_ID

and see if that helps. If you want to try for an index-only retrieval, add ID to the end of the index (so it'd be DTE_OF_SRVCE, CLM_TYPE, PRGRM_ID, ID).

Share and enjoy.

Upvotes: 1

Strange speed changes with sql query

Answers (5)

Related Questions