Reputation: 459
I've trying to get this query work efficiently over the past two days. I've learn more about Oracle Index behavior that I think I'm confused at this point what is supposed to work and what doesn't.
Basically, the query is summing up values and comparing to values from yesterday and last week.
I've played around with breaking it down, I've toyed in my mind analytic queries and changing order of indices but nothing seems to work. All my test has been on a table with 500K rows, as soon as I run it on a table with 20 Million rows it just takes forever.
Any help is greatly appreciated.
I modified the original post to help you help me. :)
CREATE TABLE TABLE_1
(ORDER_LINE_ID NUMBER, OFFSET NUMBER, BREAK_ID NUMBER, ZONE NUMBER, NETWORK NUMBER, HOUR_OF_DAY NUMBER, START_TIME DATE, END_TIME DATE, SUCCESS NUMBER
CONSTRAINT "TABLE_1_PK" PRIMARY KEY (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, HOUR_OF_DAY))
-- SUCCESS is already aggregated during the insert
-- These are last week's records
INSERT INTO TABLE_1 (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME, SUCCESS)
VALUES (1,0,1, 1, 1, 2016042001,'04/20/2016 00:00:00', '04/20/2016 02:00:00', 1);
INSERT INTO TABLE_1 (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME, SUCCESS)
VALUES (1,30,1, 1, 1, 2016042001,'04/20/2016 00:00:00', '04/20/2016 02:00:00', 2);
INSERT INTO TABLE_1 (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME, SUCCESS)
VALUES (2,0,1, 1, 1, 2016042001,'04/20/2016 00:00:00', '04/20/2016 02:00:00', 1);
INSERT INTO TABLE_1 (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME, SUCCESS)
VALUES (2,30,1, 1, 1, 2016042001,'04/20/2016 00:00:00', '04/20/2016 02:00:00', 1);
-- These are yesterday's records
INSERT INTO TABLE_1 (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME, SUCCESS)
VALUES (3,0,1, 1, 1, 2016042601,'04/26/2016 00:00:00', '04/26/2016 02:00:00', 1);
INSERT INTO TABLE_1 (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME, SUCCESS)
VALUES (3,30,1, 1, 1, 2016042601,'04/26/2016 00:00:00', '04/26/2016 02:00:00', 2);
INSERT INTO TABLE_1 (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME, SUCCESS)
VALUES (4,0,1, 1, 1, 2016042601,'04/26/2016 00:00:00', '04/26/2016 02:00:00', 1);
INSERT INTO TABLE_1 (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME, SUCCESS)
VALUES (4,30,1, 1, 1, 2016042601,'04/26/2016 00:00:00', '04/26/2016 02:00:00', 1);
-- This is today's records
INSERT INTO TABLE_1 (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME, SUCCESS)
VALUES (5,0,1, 1, 1, 2016042701,'04/27/2016 00:00:00', '04/27/2016 02:00:00', 1);
INSERT INTO TABLE_1 (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME, SUCCESS)
VALUES (5,30,1, 1, 1, 2016042701,'04/27/2016 00:00:00', '04/27/2016 02:00:00', 1);
-- Original twice join query
SELECT BREAK_ID, ORDER_LINE_ID, HOUR_OF_DAY, OFFSET, ZONE, NETWORK, START_TIME, END_TIME, SUM(SUCCESS), SUM(YESTERDAY_SUCCESS), SUM(LAST_WEEK_SUCCESS)
FROM TABLE_1 CURRENT_DAY
LEFT OUTER JOIN (
SELECT SUM(SUCCESS) YESTERDAY_SUCCESS, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME FROM TABLE_1
GROUP BY ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME
) YESTERDAY
ON YESTERDAY.START_TIME + 1 = CURRENT_DAY.START_TIME
AND YESTERDAY.END_TIME + 1 = CURRENT_DAY.END_TIME
AND YESTERDAY.HOUR_OF_DAY = CURRENT_DAY.HOUR_OF_DAY
AND YESTERDAY.NETWORK = CURRENT_DAY.NETWORK
AND YESTERDAY.ZONE = CURRENT_DAY.ZONE
LEFT OUTER JOIN (
SELECT SUM(SUCCESS) LAST_WEEK_SUCCESS, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME FROM TABLE_1
GROUP BY ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME
) LAST_WEEK
ON YESTERDAY.START_TIME + 7 = CURRENT_DAY.START_TIME
AND YESTERDAY.END_TIME + 7 = CURRENT_DAY.END_TIME
AND YESTERDAY.HOUR_OF_DAY = CURRENT_DAY.HOUR_OF_DAY
AND YESTERDAY.NETWORK = CURRENT_DAY.NETWORK
AND YESTERDAY.ZONE = CURRENT_DAY.ZONE
GROUP BY BREAK_ID, ORDER_LINE_ID, HOUR_OF_DAY, OFFSET, ZONE, NETWORK, START_TIME, END_TIME;
-- Using Analytic Query (thank you to MT0)
SELECT BREAK_ID, ORDER_LINE_ID, HOUR_OF_DAY, OFFSET, ZONE, NETWORK, START_TIME, END_TIME, SUM(SUCCESS), SUM(YESTERDAY_SUCCESS), SUM(LAST_WEEK_SUCCESS)
FROM (
SUM( SUCCESS )
OVER ( PARTITION BY ZONE, NETWORK, HOUR_OF_DAY, TO_CHAR(START_TIME, 'HH24:MI:SS'), TO_CHAR(END_TIME, 'HH24:MI:SS')
ORDER BY START_TIME
RANGE BETWEEN INTERVAL '1' DAY PRECDEDING AND INTERVAL '1' DAY PRECEDING
) AS YESTERDAY_SUCCESS,
SUM ( SUCCESS )
OVER ( PARTITION BY ZONE, NETWORK, HOUR_OF_DAY, TO_CHAR(START_TIME, 'HH24:MI:SS'), TO_CHAR(END_TIME, 'HH24:MI:SS')
ORDER BY START_TIME
RANGE BETWEEN INTERVAL '7' DAY PRECDEDING AND INTERVAL '7' DAY PRECEDING
) AS LAST_WEEK_SUCCESS
FROM TABLE_1
) T1
WHERE SYSDATE - INTERVAL '12' HOUR <= START_TIME
AND START_TIME < SYSDATE - INTERVAL '1' HOUR
GROUP BY BREAK_ID, ORDER_LINE_ID, HOUR_OF_DAY, OFFSET, ZONE, NETWORK, START_TIME, END_TIME;
I must say thank you for the help to bring this question up to something I hope is more understandable. Everything works as expected but performance could use some tuning.
1.8 seconds on a table with 500K rows
400 seconds on a table with 20 million rows
i also want to add some Execution Plans provided by Oracle. I'm having trouble tuning performance.
-- using twice self join
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | O/1/M |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 50 |00:00:00.84 | 99875 | 217 | 1705 | | | |
| 1 | HASH GROUP BY | | 1 | 6711 | 50 |00:00:00.84 | 99875 | 217 | 1705 | 1616K| 995K| |
|* 2 | FILTER | | 1 | | 119K|00:00:00.65 | 99875 | 0 | 0 | | | |
| 3 | NESTED LOOPS OUTER | | 1 | 54M| 119K|00:00:00.64 | 99875 | 0 | 0 | | | |
|* 4 | HASH JOIN OUTER | | 1 | 109 | 119K|00:00:00.52 | 99875 | 0 | 0 | 13M| 2093K| 1/0/0|
| 5 | TABLE ACCESS BY INDEX ROWID| TABLE_1_IDX | 1 | 109 | 119K|00:00:00.14 | 85908 | 0 | 0 | | | |
|* 6 | INDEX RANGE SCAN | START_TIME_IDX | 1 | 109 | 119K|00:00:00.02 | 320 | 0 | 0 | | | |
| 7 | VIEW | | 1 | 1250 | 29311 |00:00:00.23 | 13967 | 0 | 0 | | | |
| 8 | HASH GROUP BY | | 1 | 1250 | 29311 |00:00:00.22 | 13967 | 0 | 0 | 3008K| 1094K| 1/0/0|
|* 9 | FILTER | | 1 | | 88627 |00:00:00.20 | 13967 | 0 | 0 | | | |
|* 10 | TABLE ACCESS FULL | TABLE_1 | 1 | 1250 | 88627 |00:00:00.19 | 13967 | 0 | 0 | | | |
| 11 | VIEW | | 119K| 499K| 0 |00:00:00.10 | 0 | 0 | 0 | | | |
| 12 | SORT GROUP BY | | 119K| 499K| 0 |00:00:00.08 | 0 | 0 | 0 | 1024 | 1024 | 1/0/0|
|* 13 | FILTER | | 119K| | 0 |00:00:00.02 | 0 | 0 | 0 | | | |
| 14 | TABLE ACCESS FULL | TABLE_1 | 0 | 499K| 0 |00:00:00.01 | 0 | 0 | 0 | | | |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(SYSDATE@!-17<SYSDATE@!-16)
4 - access("YESTERDAY"."ZONE"="T1"."ZONE" AND "YESTERDAY"."NETWORK"="T1"."NETWORK" AND "YESTERDAY"."HOUR_OF_DAY"="T1"."HOUR_OF_DAY"
AND "T1"."END_TIME"=INTERNAL_FUNCTION("YESTERDAY"."END_TIME")+1 AND
"T1"."START_TIME"=INTERNAL_FUNCTION("YESTERDAY"."START_TIME")+1)
6 - access("T1"."START_TIME">=SYSDATE@!-17 AND "T1"."START_TIME"<SYSDATE@!-16)
9 - filter(SYSDATE@!-17<SYSDATE@!-16)
10 - filter((INTERNAL_FUNCTION("START_TIME")+1>=SYSDATE@!-17 AND INTERNAL_FUNCTION("START_TIME")+1<SYSDATE@!-16))
13 - filter(("YESTERDAY"."ZONE"="T1"."ZONE" AND "YESTERDAY"."NETWORK"="T1"."NETWORK" AND "YESTERDAY"."HOUR_OF_DAY"="T1"."HOUR_OF_DAY"
AND "T1"."END_TIME"=INTERNAL_FUNCTION("YESTERDAY"."END_TIME")+7 AND
"T1"."START_TIME"=INTERNAL_FUNCTION("YESTERDAY"."START_TIME")+7))
Another execution plan using Analytic Query (thank again to MT0)
-- using analytic query
-------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | O/1/M |
-------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 50 |00:00:01.51 | 13967 | | | |
| 1 | HASH GROUP BY | | 1 | 499K| 50 |00:00:01.51 | 13967 | 98M| 7788K| |
|* 2 | VIEW | | 1 | 499K| 119K|00:00:01.15 | 13967 | | | |
| 3 | WINDOW SORT | | 1 | 499K| 499K|00:00:01.43 | 13967 | 66M| 2823K| 1/0/0|
|* 4 | FILTER | | 1 | | 499K|00:00:00.16 | 13967 | | | |
| 5 | TABLE ACCESS FULL| TABLE_1 | 1 | 499K| 499K|00:00:00.12 | 13967 | | | |
-------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("T1"."START_TIME">=SYSDATE@!-INTERVAL'+17 00:00:00' DAY(2) TO SECOND(0) AND
"T1"."START_TIME"<SYSDATE@!-INTERVAL'+16 00:00:00' DAY(2) TO SECOND(0)))
4 - filter(SYSDATE@!-INTERVAL'+17 00:00:00' DAY(2) TO SECOND(0)<SYSDATE@!-INTERVAL'+16 00:00:00' DAY(2) TO
SECOND(0))
As you can see, I added an index on start_time which self-Join query benefits from but estimates vs actuals are off. Where Analytic Query just decides it wants nothing to do with the index. Any ideas, points of references or help is greatly appreciated. Thanks in advance, everyone.
Upvotes: 0
Views: 84
Reputation: 167981
It is unclear why you are joining only when there are rows with exactly the same time today and yesterday (or last week) but if you just want the rows that are between certain times then you can eliminate all the self-joins and do:
SELECT order_line,
zone,
network,
sum(
CASE WHEN SYSDATE - INTERVAL '12' HOUR <= start_time
AND start_time < SYSDATE - INTERVAL '1' HOUR
THEN success
END
) AS total_successes_today,
sum(
CASE WHEN SYSDATE - INTERVAL '12' HOUR <= start_time
AND start_time < SYSDATE - INTERVAL '1' HOUR
THEN error
END
) AS total_errors_today,
sum(
CASE WHEN SYSDATE - INTERVAL '36' HOUR <= start_time
AND start_time < SYSDATE - INTERVAL '25' HOUR
THEN success
END
) AS total_successes_yesterday,
sum(
CASE WHEN SYSDATE - INTERVAL '180' HOUR <= start_time
AND start_time < SYSDATE - INTERVAL '169' HOUR
THEN success
END
) AS total_successes_last_week
FROM table_1
WHERE ( SYSDATE - INTERVAL '12' HOUR <= start_time
AND start_time < SYSDATE - INTERVAL '1' HOUR ) -- today
OR ( SYSDATE - INTERVAL '36' HOUR <= start_time
AND start_time < SYSDATE - INTERVAL '25' HOUR ) -- yesterday = today + 24 hours
OR ( SYSDATE - INTERVAL '180' HOUR <= start_time
AND start_time < SYSDATE - INTERVAL '169' HOUR ) -- last week = today + 7*24 hours
However, if you do want to keep the join on the start- and end-times then you could use an analytic query:
SELECT order_line,
zone,
network,
SUM( success ),
SUM( error ),
SUM( yesterday_success ),
SUM( last_week_success )
FROM (
SELECT t.*,
SUM( success )
OVER ( PARTITION BY id,
TO_CHAR( start_time, 'HH24:MI:SS' ),
TO_CHAR( end_time, 'HH24:MI:SS' )
ORDER BY start_time
RANGE BETWEEN INTERVAL '1' DAY PRECEDING AND INTERVAL '1' DAY PRECEDING
) AS yesterday_success,
SUM( success )
OVER ( PARTITION BY id,
TO_CHAR( start_time, 'HH24:MI:SS' ),
TO_CHAR( end_time, 'HH24:MI:SS' )
ORDER BY start_time
RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND INTERVAL '7' DAY PRECEDING
) AS last_week_success
FROM TABLE_1 t
)
WHERE SYSDATE - INTERVAL '12' HOUR <= start_time
AND start_time < SYSDATE - INTERVAL '1' HOUR
GROUP BY
order_line,
zone,
network
ORDER BY
order_line,
zone,
network
You can look at whether you would get speed improvements by using function-based indexes on TO_CHAR( start_time, 'HH24:MI:SS' )
and TO_CHAR( end_time, 'HH24:MI:SS' )
.
Upvotes: 1