OneClutteredMind
OneClutteredMind

Reputation: 459

Oracle: Optimizing twice self-join query

I've trying to get this query work efficiently over the past two days. I've learn more about Oracle Index behavior that I think I'm confused at this point what is supposed to work and what doesn't.

Basically, the query is summing up values and comparing to values from yesterday and last week.

I've played around with breaking it down, I've toyed in my mind analytic queries and changing order of indices but nothing seems to work. All my test has been on a table with 500K rows, as soon as I run it on a table with 20 Million rows it just takes forever.

Any help is greatly appreciated.

I modified the original post to help you help me. :)

CREATE TABLE TABLE_1
(ORDER_LINE_ID NUMBER, OFFSET NUMBER, BREAK_ID NUMBER, ZONE NUMBER, NETWORK NUMBER, HOUR_OF_DAY NUMBER, START_TIME DATE, END_TIME DATE, SUCCESS NUMBER
  CONSTRAINT "TABLE_1_PK" PRIMARY KEY (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, HOUR_OF_DAY))

-- SUCCESS is already aggregated during the insert
-- These are last week's records
INSERT INTO TABLE_1 (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME, SUCCESS)
VALUES (1,0,1, 1, 1, 2016042001,'04/20/2016 00:00:00', '04/20/2016 02:00:00', 1);
INSERT INTO TABLE_1 (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME, SUCCESS)
VALUES (1,30,1, 1, 1, 2016042001,'04/20/2016 00:00:00', '04/20/2016 02:00:00', 2);
INSERT INTO TABLE_1 (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME, SUCCESS)
VALUES (2,0,1, 1, 1, 2016042001,'04/20/2016 00:00:00', '04/20/2016 02:00:00', 1);
INSERT INTO TABLE_1 (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME, SUCCESS)
VALUES (2,30,1, 1, 1, 2016042001,'04/20/2016 00:00:00', '04/20/2016 02:00:00', 1);

-- These are yesterday's records
INSERT INTO TABLE_1 (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME, SUCCESS)
VALUES (3,0,1, 1, 1, 2016042601,'04/26/2016 00:00:00', '04/26/2016 02:00:00', 1);
INSERT INTO TABLE_1 (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME, SUCCESS)
VALUES (3,30,1, 1, 1, 2016042601,'04/26/2016 00:00:00', '04/26/2016 02:00:00', 2);
INSERT INTO TABLE_1 (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME, SUCCESS)
VALUES (4,0,1, 1, 1, 2016042601,'04/26/2016 00:00:00', '04/26/2016 02:00:00', 1);
INSERT INTO TABLE_1 (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME, SUCCESS)
VALUES (4,30,1, 1, 1, 2016042601,'04/26/2016 00:00:00', '04/26/2016 02:00:00', 1);

-- This is today's records
INSERT INTO TABLE_1 (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME, SUCCESS)
VALUES (5,0,1, 1, 1, 2016042701,'04/27/2016 00:00:00', '04/27/2016 02:00:00', 1);
INSERT INTO TABLE_1 (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME, SUCCESS)
VALUES (5,30,1, 1, 1, 2016042701,'04/27/2016 00:00:00', '04/27/2016 02:00:00', 1);

-- Original twice join query
SELECT BREAK_ID, ORDER_LINE_ID, HOUR_OF_DAY, OFFSET, ZONE, NETWORK, START_TIME, END_TIME, SUM(SUCCESS), SUM(YESTERDAY_SUCCESS), SUM(LAST_WEEK_SUCCESS)
FROM TABLE_1 CURRENT_DAY
LEFT OUTER JOIN (
  SELECT SUM(SUCCESS) YESTERDAY_SUCCESS, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME FROM TABLE_1 
  GROUP BY ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME
) YESTERDAY 
  ON YESTERDAY.START_TIME + 1 = CURRENT_DAY.START_TIME
  AND YESTERDAY.END_TIME + 1 = CURRENT_DAY.END_TIME
  AND YESTERDAY.HOUR_OF_DAY = CURRENT_DAY.HOUR_OF_DAY
  AND YESTERDAY.NETWORK = CURRENT_DAY.NETWORK
  AND YESTERDAY.ZONE = CURRENT_DAY.ZONE
LEFT OUTER JOIN (
  SELECT SUM(SUCCESS) LAST_WEEK_SUCCESS, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME FROM TABLE_1
  GROUP BY ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME
  ) LAST_WEEK 
  ON YESTERDAY.START_TIME + 7 = CURRENT_DAY.START_TIME
  AND YESTERDAY.END_TIME + 7 = CURRENT_DAY.END_TIME
  AND YESTERDAY.HOUR_OF_DAY = CURRENT_DAY.HOUR_OF_DAY
  AND YESTERDAY.NETWORK = CURRENT_DAY.NETWORK
  AND YESTERDAY.ZONE = CURRENT_DAY.ZONE
GROUP BY BREAK_ID, ORDER_LINE_ID, HOUR_OF_DAY, OFFSET, ZONE, NETWORK, START_TIME, END_TIME;

-- Using Analytic Query (thank you to MT0)
SELECT BREAK_ID, ORDER_LINE_ID, HOUR_OF_DAY, OFFSET, ZONE, NETWORK, START_TIME, END_TIME, SUM(SUCCESS), SUM(YESTERDAY_SUCCESS), SUM(LAST_WEEK_SUCCESS)
FROM (
  SUM( SUCCESS )
    OVER ( PARTITION BY ZONE, NETWORK, HOUR_OF_DAY, TO_CHAR(START_TIME, 'HH24:MI:SS'), TO_CHAR(END_TIME, 'HH24:MI:SS')
           ORDER BY START_TIME
           RANGE BETWEEN INTERVAL '1' DAY PRECDEDING AND INTERVAL '1' DAY PRECEDING
          ) AS YESTERDAY_SUCCESS,
SUM ( SUCCESS )
    OVER ( PARTITION BY ZONE, NETWORK, HOUR_OF_DAY, TO_CHAR(START_TIME, 'HH24:MI:SS'), TO_CHAR(END_TIME, 'HH24:MI:SS')
           ORDER BY START_TIME
           RANGE BETWEEN INTERVAL '7' DAY PRECDEDING AND INTERVAL '7' DAY PRECEDING
         ) AS LAST_WEEK_SUCCESS
FROM TABLE_1 
) T1
WHERE SYSDATE - INTERVAL '12' HOUR <= START_TIME
AND START_TIME < SYSDATE - INTERVAL '1' HOUR
GROUP BY BREAK_ID, ORDER_LINE_ID, HOUR_OF_DAY, OFFSET, ZONE, NETWORK, START_TIME, END_TIME;

I must say thank you for the help to bring this question up to something I hope is more understandable. Everything works as expected but performance could use some tuning.

1.8 seconds on a table with 500K rows

400 seconds on a table with 20 million rows

i also want to add some Execution Plans provided by Oracle. I'm having trouble tuning performance.

-- using twice self join
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name                      | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem |  O/1/M   |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                           |      1 |        |     50 |00:00:00.84 |   99875 |    217 |   1705 |       |       |          |
|   1 |  HASH GROUP BY                  |                           |      1 |   6711 |     50 |00:00:00.84 |   99875 |    217 |   1705 |  1616K|   995K|          |
|*  2 |   FILTER                        |                           |      1 |        |    119K|00:00:00.65 |   99875 |      0 |      0 |       |       |          |
|   3 |    NESTED LOOPS OUTER           |                           |      1 |     54M|    119K|00:00:00.64 |   99875 |      0 |      0 |       |       |          |
|*  4 |     HASH JOIN OUTER             |                           |      1 |    109 |    119K|00:00:00.52 |   99875 |      0 |      0 |    13M|  2093K|     1/0/0|
|   5 |      TABLE ACCESS BY INDEX ROWID| TABLE_1_IDX               |      1 |    109 |    119K|00:00:00.14 |   85908 |      0 |      0 |       |       |          |
|*  6 |       INDEX RANGE SCAN          | START_TIME_IDX            |      1 |    109 |    119K|00:00:00.02 |     320 |      0 |      0 |       |       |          |
|   7 |      VIEW                       |                           |      1 |   1250 |  29311 |00:00:00.23 |   13967 |      0 |      0 |       |       |          |
|   8 |       HASH GROUP BY             |                           |      1 |   1250 |  29311 |00:00:00.22 |   13967 |      0 |      0 |  3008K|  1094K|     1/0/0|
|*  9 |        FILTER                   |                           |      1 |        |  88627 |00:00:00.20 |   13967 |      0 |      0 |       |       |          |
|* 10 |         TABLE ACCESS FULL       | TABLE_1                   |      1 |   1250 |  88627 |00:00:00.19 |   13967 |      0 |      0 |       |       |          |
|  11 |     VIEW                        |                           |    119K|    499K|      0 |00:00:00.10 |       0 |      0 |      0 |       |       |          |
|  12 |      SORT GROUP BY              |                           |    119K|    499K|      0 |00:00:00.08 |       0 |      0 |      0 |  1024 |  1024 |     1/0/0|
|* 13 |       FILTER                    |                           |    119K|        |      0 |00:00:00.02 |       0 |      0 |      0 |       |       |          |
|  14 |        TABLE ACCESS FULL        | TABLE_1                   |      0 |    499K|      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(SYSDATE@!-17<SYSDATE@!-16)
   4 - access("YESTERDAY"."ZONE"="T1"."ZONE" AND "YESTERDAY"."NETWORK"="T1"."NETWORK" AND "YESTERDAY"."HOUR_OF_DAY"="T1"."HOUR_OF_DAY" 
              AND "T1"."END_TIME"=INTERNAL_FUNCTION("YESTERDAY"."END_TIME")+1 AND 
              "T1"."START_TIME"=INTERNAL_FUNCTION("YESTERDAY"."START_TIME")+1)
   6 - access("T1"."START_TIME">=SYSDATE@!-17 AND "T1"."START_TIME"<SYSDATE@!-16)
   9 - filter(SYSDATE@!-17<SYSDATE@!-16)
  10 - filter((INTERNAL_FUNCTION("START_TIME")+1>=SYSDATE@!-17 AND INTERNAL_FUNCTION("START_TIME")+1<SYSDATE@!-16))
  13 - filter(("YESTERDAY"."ZONE"="T1"."ZONE" AND "YESTERDAY"."NETWORK"="T1"."NETWORK" AND "YESTERDAY"."HOUR_OF_DAY"="T1"."HOUR_OF_DAY" 
              AND "T1"."END_TIME"=INTERNAL_FUNCTION("YESTERDAY"."END_TIME")+7 AND 
              "T1"."START_TIME"=INTERNAL_FUNCTION("YESTERDAY"."START_TIME")+7))

Another execution plan using Analytic Query (thank again to MT0)

-- using analytic query
-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem |  O/1/M   |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                  |      1 |        |     50 |00:00:01.51 |   13967 |       |       |          |
|   1 |  HASH GROUP BY        |                  |      1 |    499K|     50 |00:00:01.51 |   13967 |    98M|  7788K|          |
|*  2 |   VIEW                |                  |      1 |    499K|    119K|00:00:01.15 |   13967 |       |       |          |
|   3 |    WINDOW SORT        |                  |      1 |    499K|    499K|00:00:01.43 |   13967 |    66M|  2823K|     1/0/0|
|*  4 |     FILTER            |                  |      1 |        |    499K|00:00:00.16 |   13967 |       |       |          |
|   5 |      TABLE ACCESS FULL| TABLE_1          |      1 |    499K|    499K|00:00:00.12 |   13967 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("T1"."START_TIME">=SYSDATE@!-INTERVAL'+17 00:00:00' DAY(2) TO SECOND(0) AND 
              "T1"."START_TIME"<SYSDATE@!-INTERVAL'+16 00:00:00' DAY(2) TO SECOND(0)))
   4 - filter(SYSDATE@!-INTERVAL'+17 00:00:00' DAY(2) TO SECOND(0)<SYSDATE@!-INTERVAL'+16 00:00:00' DAY(2) TO 
              SECOND(0))

As you can see, I added an index on start_time which self-Join query benefits from but estimates vs actuals are off. Where Analytic Query just decides it wants nothing to do with the index. Any ideas, points of references or help is greatly appreciated. Thanks in advance, everyone.

Upvotes: 0

Views: 84

Answers (1)

MT0
MT0

Reputation: 167981

It is unclear why you are joining only when there are rows with exactly the same time today and yesterday (or last week) but if you just want the rows that are between certain times then you can eliminate all the self-joins and do:

SELECT order_line,
       zone,
       network,
       sum(
        CASE WHEN SYSDATE - INTERVAL '12' HOUR <= start_time
             AND  start_time < SYSDATE - INTERVAL '1' HOUR
             THEN success
             END
       ) AS total_successes_today,
       sum(
        CASE WHEN SYSDATE - INTERVAL '12' HOUR <= start_time
             AND  start_time < SYSDATE - INTERVAL '1' HOUR
             THEN error
             END
       ) AS total_errors_today,
       sum(
        CASE WHEN SYSDATE - INTERVAL '36' HOUR <= start_time
             AND  start_time < SYSDATE - INTERVAL '25' HOUR
             THEN success
             END
       ) AS total_successes_yesterday,
       sum(
        CASE WHEN SYSDATE - INTERVAL '180' HOUR <= start_time
             AND  start_time < SYSDATE - INTERVAL '169' HOUR
             THEN success
             END
       ) AS total_successes_last_week
FROM   table_1
WHERE  (    SYSDATE - INTERVAL '12' HOUR <= start_time
        AND start_time < SYSDATE - INTERVAL '1' HOUR ) -- today
OR     (    SYSDATE - INTERVAL '36' HOUR <= start_time
        AND start_time < SYSDATE - INTERVAL '25' HOUR ) -- yesterday = today + 24 hours
OR     (    SYSDATE - INTERVAL '180' HOUR <= start_time
        AND start_time < SYSDATE - INTERVAL '169' HOUR ) -- last week = today + 7*24 hours

However, if you do want to keep the join on the start- and end-times then you could use an analytic query:

SELECT order_line,
       zone,
       network,
       SUM( success ),
       SUM( error ),
       SUM( yesterday_success ),
       SUM( last_week_success )
FROM   (
  SELECT t.*,
         SUM( success )
           OVER ( PARTITION BY id,
                               TO_CHAR( start_time, 'HH24:MI:SS' ),
                               TO_CHAR( end_time, 'HH24:MI:SS' )
                  ORDER BY start_time
                  RANGE BETWEEN INTERVAL '1' DAY PRECEDING AND INTERVAL '1' DAY PRECEDING
                ) AS yesterday_success,
         SUM( success )
           OVER ( PARTITION BY id,
                               TO_CHAR( start_time, 'HH24:MI:SS' ),
                               TO_CHAR( end_time, 'HH24:MI:SS' )
                  ORDER BY start_time
                  RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND INTERVAL '7' DAY PRECEDING
                ) AS last_week_success
  FROM   TABLE_1 t
)
WHERE  SYSDATE - INTERVAL '12' HOUR <= start_time 
AND    start_time < SYSDATE - INTERVAL '1' HOUR
GROUP BY
       order_line,
       zone,
       network
ORDER BY
       order_line,
       zone,
       network

You can look at whether you would get speed improvements by using function-based indexes on TO_CHAR( start_time, 'HH24:MI:SS' ) and TO_CHAR( end_time, 'HH24:MI:SS' ).

Upvotes: 1

Related Questions