Oracle - Query Optimization - Query runs for a long time

I have a oracle query which is executed once a month to get the order details processed. This query is taking a painfully lot of time to execute. ( More than thirty mins ). Therefore I am trying to optimize this. I have a decent knowledge in Oracle and I will explain what I have tried so far. Still, it takes around 20 minutes to complete. This is the query. Oracle version is 11g.

SELECT store_typ, store_no, COUNT(order_no) FROM
(
    SELECT DISTINCT(order_no), store.store_no, store.store_typ FROM 
    (
        SELECT trx.order_no,trx.ADDED_DATE, odr.prod_typ, odr.store_no FROM daily_trx trx 
        LEFT OUTER JOIN 
        (
            SELECT odr.order_no,odr.prod_typ,prod.store_no FROM order_main odr 
            LEFT OUTER JOIN ORDR_PROD_TYP prod
            on odr.prod_typ = prod.prod_typ  
        ) odr
        ON trx.order_no=  odr.order_no
    ) daily_orders ,  
    (SELECT store_no,store_typ FROM main_stores ) store

    WHERE 1=1 
    and daily_orders.order_no !='NA'
    and store.store_no = daily_orders.store_no
    AND to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') >= to_date('01-05-2020 00:00:00','DD-MM-YYYY HH24:MI:SS')
    AND to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') <= to_date('31-05-2020 23:59:59','DD-MM-YYYY HH24:MI:SS')
)
GROUP BY store_typ, store_no

Background

order_main - This table has over 4 million records
I introduced index for order_no column which reduced time to execute.

My questions are as follows.

1) Will it help if I move date validation inside the inner query like this ?

SELECT store_typ, store_no, COUNT(order_no) FROM
(
    SELECT DISTINCT(order_no), store.store_no, store.store_typ FROM 
    (
        SELECT trx.order_no,trx.ADDED_DATE, odr.prod_typ, odr.store_no FROM daily_trx trx 

        LEFT OUTER JOIN 
        (
            SELECT odr.order_no,odr.prod_typ,prod.store_no FROM order_main odr 
            LEFT OUTER JOIN ORDR_PROD_TYP prod
            on odr.prod_typ = prod.prod_typ  
        ) odr
        ON trx.order_no=  odr.order_no
        WHERE  to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') >= to_date('01-05-2020 00:00:00','DD-MM-YYYY HH24:MI:SS')
        AND to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') <= to_date('31-05-2020 23:59:59','DD-MM-YYYY HH24:MI:SS')
    ) daily_orders ,  
    (SELECT store_no,store_typ FROM main_stores ) store

    WHERE 1=1 
    and daily_orders.order_no !='NA'
    and store.store_no = daily_orders.store_no
    --AND to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') >= to_date('01-05-2020 00:00:00','DD-MM-YYYY HH24:MI:SS')
    --AND to_timestamp(to_char(daily_orders.ADDED_DATE,'DD-MM-YYYY HH24:MI:SS'),'DD-MM-YYYY HH24:MI:SS') <= to_date('31-05-2020 23:59:59','DD-MM-YYYY HH24:MI:SS')
)
GROUP BY store_typ, store_no

2) Could someone please suggest any other improvements that can be done to this query?

3) Additional indexing would help in any other tables / columns ? Only daily_trx and order_main tables are the tables that contains huge amount of data.

Upvotes: 0

Answers (2)

Marmite Bomber

Reputation: 21053

Performance Considereations

You process a month of data, so there will be probably a large number of transaction (say 100K+). In this case the best approach is to full scan the two large tables and perform HASH JOINs.

You can expect this execution plan

    ----------------------------------------------------------------------------------------------
| Id  | Operation            | Name          | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |               |   199K|  5850K|       |   592   (2)| 00:00:08 |
|*  1 |  HASH JOIN           |               |   199K|  5850K|       |   592   (2)| 00:00:08 |
|   2 |   TABLE ACCESS FULL  | MAIN_STORES   |    26 |   104 |       |     3   (0)| 00:00:01 |
|*  3 |   HASH JOIN          |               |   199K|  5070K|       |   588   (2)| 00:00:08 |
|   4 |    TABLE ACCESS FULL | ORDR_PROD_TYP |    26 |   104 |       |     3   (0)| 00:00:01 |
|*  5 |    HASH JOIN         |               |   199K|  4290K|  1960K|   584   (1)| 00:00:08 |
|*  6 |     TABLE ACCESS FULL| ORDER_MAIN    |   100K|   782K|       |    69   (2)| 00:00:01 |
|*  7 |     TABLE ACCESS FULL| DAILY_TRX     |   200K|  2734K|       |   172   (2)| 00:00:03 |
----------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - access("STORE"."STORE_NO"="PROD"."STORE_NO")
   3 - access("ODR"."PROD_TYP"="PROD"."PROD_TYP")
   5 - access("TRX"."ORDER_NO"="ODR"."ORDER_NO")
   6 - filter("ODR"."ORDER_NO"<>'NA')
   7 - filter("TRX"."ADDED_DATE"<TO_DATE(' 2020-06-01 00:00:00', 'syyyy-mm-dd 
              hh24:mi:ss') AND "TRX"."ORDER_NO"<>'NA' AND "TRX"."ADDED_DATE">=TO_DATE(' 2020-05-01 
              00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

If you have a partition option available you will massively profit by defining a monthly partitioning schema (alternative a daily partitioning) on the two tables DAILY_TRX and ORDER_MAIN.

If the above assumption is not correct and you have very few transactions in the selected time interval (say below 1K) - you will go better using the index access and NESTED LOOPS joins.

You will need this set of indices

create index daily_trx_date on daily_trx(ADDED_DATE);
 
create unique index order_main_idx on order_main (order_no);

create unique index ORDR_PROD_TYP_idx1 on ORDR_PROD_TYP(prod_typ);  

create unique index main_stores_idx1 on main_stores(store_no);

The expected plan is as follows

---------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                  |    92 |  2760 |    80   (4)| 00:00:01 |
|*  1 |  HASH JOIN                     |                  |    92 |  2760 |    80   (4)| 00:00:01 |
|*  2 |   TABLE ACCESS BY INDEX ROWID  | DAILY_TRX        |    92 |  1288 |     4   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN            | DAILY_TRX_DATE   |    92 |       |     3   (0)| 00:00:01 |
|*  4 |   HASH JOIN                    |                  |   100K|  1564K|    75   (3)| 00:00:01 |
|   5 |    MERGE JOIN                  |                  |    26 |   208 |     6  (17)| 00:00:01 |
|   6 |     TABLE ACCESS BY INDEX ROWID| MAIN_STORES      |    26 |   104 |     2   (0)| 00:00:01 |
|   7 |      INDEX FULL SCAN           | MAIN_STORES_IDX1 |    26 |       |     1   (0)| 00:00:01 |
|*  8 |     SORT JOIN                  |                  |    26 |   104 |     4  (25)| 00:00:01 |
|   9 |      TABLE ACCESS FULL         | ORDR_PROD_TYP    |    26 |   104 |     3   (0)| 00:00:01 |
|* 10 |    TABLE ACCESS FULL           | ORDER_MAIN       |   100K|   782K|    69   (2)| 00:00:01 |
---------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - access("TRX"."ORDER_NO"="ODR"."ORDER_NO")
   2 - filter("TRX"."ORDER_NO"<>'NA')
   3 - access("TRX"."ADDED_DATE">=TO_DATE(' 2020-06-01 00:00:00', 'syyyy-mm-dd 
              hh24:mi:ss') AND "TRX"."ADDED_DATE"<TO_DATE(' 2020-07-01 00:00:00', 'syyyy-mm-dd 
              hh24:mi:ss'))
   4 - access("ODR"."PROD_TYP"="PROD"."PROD_TYP")
   8 - access("STORE"."STORE_NO"="PROD"."STORE_NO")
       filter("STORE"."STORE_NO"="PROD"."STORE_NO")
  10 - filter("ODR"."ORDER_NO"<>'NA')

Check here how to get the execution plan of your query

Upvotes: 0

Marmite Bomber

Reputation: 21053

Some generall suggestions

Do not combine ANSI and Oracle Join Syntax in one Query
Do not use outer join if inner join can be used

Your inner subqueries use outer joins, but the final join to main_stores is an inner join eliminating all rows with store_no is null - you may use inner joins with the same result.

Filter rows early

A suboptimal practice is to first join in a subquery and than filter relevant row with where conditions

Use simple predicated

If you want to constraint a DATE column do it this way

trx.ADDED_DATE >= to_date('01-05-2020 00:00:00','DD-MM-YYYY HH24:MI:SS')

Use count distinct if appropriate

The select DISTINCTquery in the third line cam be eliminated if you use COUNT(DISTINCT order_no)

Applying all the above point I come to the following query

select 
  store.store_no, store.store_typ, count(DISTINCT trx.order_no) order_no_cnt
from daily_trx trx
join order_main odr on trx.order_no = odr.order_no
join ordr_prod_typ prod on odr.prod_typ = prod.prod_typ 
join main_stores store on store.store_no = prod.store_no
where trx.ADDED_DATE >=  date'2020-05-01' and
trx.ADDED_DATE < date'2020-06-01' and
trx.order_no !='NA'
group by store.store_no, store.store_typ

Upvotes: 2

Oracle - Query Optimization - Query runs for a long time

Answers (2)

Related Questions