ADT
ADT

Reputation: 43

Why does this SELECT statement take more than two hours?

I have two tables A & B. A has 6,760,636 records and B has 452,175,960 records. Here is the SELECT Statement I'm using:

SELECT  /*+ parallel (T,1) */
         T.*
     FROM     TABLE_A T,
              TABLE_B P 
    WHERE     T.DESTINATION = P.DESTINATION 
          AND T.SAIL_DATE = P.SAIL_DATE 
          AND T.PACKAGE_TYPE = P.PACKAGE_TYPE 
          AND T.CABIN_CATEGORY = P.CABIN_CATEGORY 
          AND T.BOOKING_SOURCE = P.BOOKING_SOURCE 
          AND T.FARE_TYPE = P.FARE_TYPE 
          AND T.POST_DATE = P.POST_DATE;

I tried creating the index on TABLE_A, but still it is not considering the INDEX and doing a FULL TABLE SCAN.

The EXPLAIN PLAN for above is

| Id  | Operation                    | Name                      | Rows  | Bytes | Cost (%CPU)|    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                           |  6760K|  1805M|   747K  (1)|        |      |            |
|   1 |  SORT AGGREGATE              |                           |     1 |    48 |            |        |      |            |
|   2 |   TABLE ACCESS BY INDEX ROWID| CL_PRICING_CONTROLS       |     1 |    48 |     2   (0)|        |      |            |
|   3 |    INDEX RANGE SCAN          | CL_PRICING_CONTROLS_IX3   |     1 |       |     1   (0)|        |      |            |
|   4 |  PX COORDINATOR              |                           |       |       |            |        |      |            |
|   5 |   PX SEND QC (RANDOM)        | :TQ10002                  |  6760K|  1805M|   747K  (1)|  Q1,02 | P->S | QC (RAND)  |
|   6 |    HASH JOIN                 |                           |  6760K|  1805M|   747K  (1)|  Q1,02 | PCWP |            |
|   7 |     PX RECEIVE               |                           |  6760K|  1437M|  1443   (1)|  Q1,02 | PCWP |            |
|   8 |      PX SEND HASH            | :TQ10001                  |  6760K|  1437M|  1443   (1)|  Q1,01 | P->P | HASH       |
|   9 |       PX BLOCK ITERATOR      |                           |  6760K|  1437M|  1443   (1)|  Q1,01 | PCWC |            |
|  10 |        TABLE ACCESS FULL     | TMP_RES_PRICE_CONTROL_111 |  6760K|  1437M|  1443   (1)|  Q1,01 | PCWP |            |
|  11 |     BUFFER SORT              |                           |       |       |            |  Q1,02 | PCWC |            |
|  12 |      PX RECEIVE              |                           |   450M|    23G|   746K  (1)|  Q1,02 | PCWP |            |
|  13 |       PX SEND HASH           | :TQ10000                  |   450M|    23G|   746K  (1)|        | S->P | HASH       |
|  14 |        INDEX FULL SCAN       | CL_PRICING_CONTROLS_IX1   |   450M|    23G|   746K  (1)|        |      |            |

I tried parallel query on both tables as below

SELECT  /*+ PARALLEL(T, 32)  PARALLEL(P, 32)*/
         T.*
     FROM     TABLE_A T,
              TABLE_B P 
    WHERE     T.DESTINATION = P.DESTINATION 
          AND T.SAIL_DATE = P.SAIL_DATE 
          AND T.PACKAGE_TYPE = P.PACKAGE_TYPE 
          AND T.CABIN_CATEGORY = P.CABIN_CATEGORY 
          AND T.BOOKING_SOURCE = P.BOOKING_SOURCE 
          AND T.FARE_TYPE = P.FARE_TYPE 
          AND T.POST_DATE = P.POST_DATE;

But this results in the EXPLAIN PLAN as below, where I see CPU with lower Cost, but still it is taking two hours.

| Id  | Operation                    | Name                      | Rows  | Bytes | Cost (%CPU)|    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                           |  6760K|  1805M| 59345   (1)|        |      |            |
|   1 |  SORT AGGREGATE              |                           |     1 |    48 |            |        |      |            |
|   2 |   TABLE ACCESS BY INDEX ROWID| CL_PRICING_CONTROLS       |     1 |    48 |     2   (0)|        |      |            |
|   3 |    INDEX RANGE SCAN          | CL_PRICING_CONTROLS_IX3   |     1 |       |     1   (0)|        |      |            |
|   4 |  PX COORDINATOR              |                           |       |       |            |        |      |            |
|   5 |   PX SEND QC (RANDOM)        | :TQ10002                  |  6760K|  1805M| 59345   (1)|  Q1,02 | P->S | QC (RAND)  |
|   6 |    HASH JOIN BUFFERED        |                           |  6760K|  1805M| 59345   (1)|  Q1,02 | PCWP |            |
|   7 |     PX RECEIVE               |                           |  6760K|  1437M|  1443   (1)|  Q1,02 | PCWP |            |
|   8 |      PX SEND HASH            | :TQ10000                  |  6760K|  1437M|  1443   (1)|  Q1,00 | P->P | HASH       |
|   9 |       PX BLOCK ITERATOR      |                           |  6760K|  1437M|  1443   (1)|  Q1,00 | PCWC |            |
|  10 |        TABLE ACCESS FULL     | TMP_RES_PRICE_CONTROL_111 |  6760K|  1437M|  1443   (1)|  Q1,00 | PCWP |            |
|  11 |     PX RECEIVE               |                           |   450M|    23G| 57858   (1)|  Q1,02 | PCWP |            |
|  12 |      PX SEND HASH            | :TQ10001                  |   450M|    23G| 57858   (1)|  Q1,01 | P->P | HASH       |
|  13 |       PX BLOCK ITERATOR      |                           |   450M|    23G| 57858   (1)|  Q1,01 | PCWC |            |
|  14 |        TABLE ACCESS FULL     | CL_PRICING_CONTROLS       |   450M|    23G| 57858   (1)|  Q1,01 | PCWP |            |

Upvotes: 1

Views: 89

Answers (1)

APC
APC

Reputation: 146209

Please look at your query, specifically at its WHERE clause. It consists entirely of join conditions. It could be rewritten using the ANSI-92 syntax to this:

SELECT  /*+ parallel (T,1) */
         T.*
FROM     TABLE_A T
          inner join TABLE_B P 
on     T.DESTINATION = P.DESTINATION 
      AND T.SAIL_DATE = P.SAIL_DATE 
      AND T.PACKAGE_TYPE = P.PACKAGE_TYPE 
      AND T.CABIN_CATEGORY = P.CABIN_CATEGORY 
      AND T.BOOKING_SOURCE = P.BOOKING_SOURCE 
      AND T.FARE_TYPE = P.FARE_TYPE 
      AND T.POST_DATE = P.POST_DATE;

So every row in each table has to be considered. Clearly full table scans are the only plausible access path.

Building a composite index on all columns in the JOIN clause is unlikely to change this. You are selecting all the columns from TABLE_A, so the database needs to visit the table anyway.

Unless the number of records in the intersection between the two tables is incredibly small it will be more efficient to read the tables in multi-block reads rather than index scans with table row look-ups. As it is, you're selecting almost one-in-six of all the rows in a half-billion row table. How would an index make that faster?


Incidentally, how did you decide on the degree of parallelism? How many CPUs does your server have? What other processes are running concurrently? What is the value of MAX_PARALLEL_SERVERS? Find out more.

Upvotes: 1

Related Questions