Reputation: 154
I have a oracle query which is executed once a month to get the order details processed. This query is taking a painfully lot of time to execute. ( More than thirty mins ). Therefore I am trying to optimize this. I have a decent knowledge in Oracle and I will explain what I have tried so far. Still, it takes around 20 minutes to complete. This is the query. Oracle version is 11g.
SELECT store_typ, store_no, COUNT(order_no) FROM
(
SELECT DISTINCT(order_no), store.store_no, store.store_typ FROM
(
SELECT trx.order_no,trx.ADDED_DATE, odr.prod_typ, odr.store_no FROM daily_trx trx
LEFT OUTER JOIN
(
SELECT odr.order_no,odr.prod_typ,prod.store_no FROM order_main odr
LEFT OUTER JOIN ORDR_PROD_TYP prod
on odr.prod_typ = prod.prod_typ
) odr
ON trx.order_no= odr.order_no
) daily_orders ,
(SELECT store_no,store_typ FROM main_stores ) store
WHERE 1=1
and daily_orders.order_no !='NA'
and store.store_no = daily_orders.store_no
AND to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') >= to_date('01-05-2020 00:00:00','DD-MM-YYYY HH24:MI:SS')
AND to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') <= to_date('31-05-2020 23:59:59','DD-MM-YYYY HH24:MI:SS')
)
GROUP BY store_typ, store_no
Background
My questions are as follows.
1) Will it help if I move date validation inside the inner query like this ?
SELECT store_typ, store_no, COUNT(order_no) FROM
(
SELECT DISTINCT(order_no), store.store_no, store.store_typ FROM
(
SELECT trx.order_no,trx.ADDED_DATE, odr.prod_typ, odr.store_no FROM daily_trx trx
LEFT OUTER JOIN
(
SELECT odr.order_no,odr.prod_typ,prod.store_no FROM order_main odr
LEFT OUTER JOIN ORDR_PROD_TYP prod
on odr.prod_typ = prod.prod_typ
) odr
ON trx.order_no= odr.order_no
WHERE to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') >= to_date('01-05-2020 00:00:00','DD-MM-YYYY HH24:MI:SS')
AND to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') <= to_date('31-05-2020 23:59:59','DD-MM-YYYY HH24:MI:SS')
) daily_orders ,
(SELECT store_no,store_typ FROM main_stores ) store
WHERE 1=1
and daily_orders.order_no !='NA'
and store.store_no = daily_orders.store_no
--AND to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') >= to_date('01-05-2020 00:00:00','DD-MM-YYYY HH24:MI:SS')
--AND to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') <= to_date('31-05-2020 23:59:59','DD-MM-YYYY HH24:MI:SS')
)
GROUP BY store_typ, store_no
2) Could someone please suggest any other improvements that can be done to this query?
3) Additional indexing would help in any other tables / columns ? Only daily_trx and order_main tables are the tables that contains huge amount of data.
Upvotes: 0
Views: 62
Reputation: 21053
Performance Considereations
You process a month of data, so there will be probably a large number of transaction (say 100K+). In this case the best approach is to full scan the two large tables and perform HASH JOIN
s.
You can expect this execution plan
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 199K| 5850K| | 592 (2)| 00:00:08 |
|* 1 | HASH JOIN | | 199K| 5850K| | 592 (2)| 00:00:08 |
| 2 | TABLE ACCESS FULL | MAIN_STORES | 26 | 104 | | 3 (0)| 00:00:01 |
|* 3 | HASH JOIN | | 199K| 5070K| | 588 (2)| 00:00:08 |
| 4 | TABLE ACCESS FULL | ORDR_PROD_TYP | 26 | 104 | | 3 (0)| 00:00:01 |
|* 5 | HASH JOIN | | 199K| 4290K| 1960K| 584 (1)| 00:00:08 |
|* 6 | TABLE ACCESS FULL| ORDER_MAIN | 100K| 782K| | 69 (2)| 00:00:01 |
|* 7 | TABLE ACCESS FULL| DAILY_TRX | 200K| 2734K| | 172 (2)| 00:00:03 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("STORE"."STORE_NO"="PROD"."STORE_NO")
3 - access("ODR"."PROD_TYP"="PROD"."PROD_TYP")
5 - access("TRX"."ORDER_NO"="ODR"."ORDER_NO")
6 - filter("ODR"."ORDER_NO"<>'NA')
7 - filter("TRX"."ADDED_DATE"<TO_DATE(' 2020-06-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "TRX"."ORDER_NO"<>'NA' AND "TRX"."ADDED_DATE">=TO_DATE(' 2020-05-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
If you have a partition option available you will massively profit by defining a monthly partitioning schema (alternative a daily partitioning) on the two tables DAILY_TRX
and ORDER_MAIN
.
If the above assumption is not correct and you have very few transactions in the selected time interval (say below 1K) - you will go better using the index access and NESTED LOOPS
joins.
You will need this set of indices
create index daily_trx_date on daily_trx(ADDED_DATE);
create unique index order_main_idx on order_main (order_no);
create unique index ORDR_PROD_TYP_idx1 on ORDR_PROD_TYP(prod_typ);
create unique index main_stores_idx1 on main_stores(store_no);
The expected plan is as follows
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 92 | 2760 | 80 (4)| 00:00:01 |
|* 1 | HASH JOIN | | 92 | 2760 | 80 (4)| 00:00:01 |
|* 2 | TABLE ACCESS BY INDEX ROWID | DAILY_TRX | 92 | 1288 | 4 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | DAILY_TRX_DATE | 92 | | 3 (0)| 00:00:01 |
|* 4 | HASH JOIN | | 100K| 1564K| 75 (3)| 00:00:01 |
| 5 | MERGE JOIN | | 26 | 208 | 6 (17)| 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID| MAIN_STORES | 26 | 104 | 2 (0)| 00:00:01 |
| 7 | INDEX FULL SCAN | MAIN_STORES_IDX1 | 26 | | 1 (0)| 00:00:01 |
|* 8 | SORT JOIN | | 26 | 104 | 4 (25)| 00:00:01 |
| 9 | TABLE ACCESS FULL | ORDR_PROD_TYP | 26 | 104 | 3 (0)| 00:00:01 |
|* 10 | TABLE ACCESS FULL | ORDER_MAIN | 100K| 782K| 69 (2)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("TRX"."ORDER_NO"="ODR"."ORDER_NO")
2 - filter("TRX"."ORDER_NO"<>'NA')
3 - access("TRX"."ADDED_DATE">=TO_DATE(' 2020-06-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "TRX"."ADDED_DATE"<TO_DATE(' 2020-07-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))
4 - access("ODR"."PROD_TYP"="PROD"."PROD_TYP")
8 - access("STORE"."STORE_NO"="PROD"."STORE_NO")
filter("STORE"."STORE_NO"="PROD"."STORE_NO")
10 - filter("ODR"."ORDER_NO"<>'NA')
Check here how to get the execution plan of your query
Upvotes: 0
Reputation: 21053
Some generall suggestions
Do not combine ANSI and Oracle Join Syntax in one Query
Do not use outer join if inner join can be used
Your inner subqueries use outer joins, but the final join to main_stores
is an inner join
eliminating all rows with store_no is null
- you may use inner joins with the same result.
A suboptimal practice is to first join in a subquery and than filter relevant row with where
conditions
If you want to constraint a DATE
column do it this way
trx.ADDED_DATE >= to_date('01-05-2020 00:00:00','DD-MM-YYYY HH24:MI:SS')
count distinct
if appropriateThe select DISTINCT
query in the third line cam be eliminated if you use COUNT(DISTINCT order_no)
Applying all the above point I come to the following query
select
store.store_no, store.store_typ, count(DISTINCT trx.order_no) order_no_cnt
from daily_trx trx
join order_main odr on trx.order_no = odr.order_no
join ordr_prod_typ prod on odr.prod_typ = prod.prod_typ
join main_stores store on store.store_no = prod.store_no
where trx.ADDED_DATE >= date'2020-05-01' and
trx.ADDED_DATE < date'2020-06-01' and
trx.order_no !='NA'
group by store.store_no, store.store_typ
Upvotes: 2