Reputation: 344

How to improve performance in Oracle using SELECT DISTINCT

I'm currently working in the deployment of an OFBiz based ERP The database being used is Oracle 10g Enterprise

One of the biggest issues is some oracle performance problems, analyzing the ofbiz logs, the following query:

SELECT DISTINCT ORDER_ID, ORDER_TYPE_ID, ORDER_NAME, EXTERNAL_ID,
 SALES_CHANNEL_ENUM_ID, ORDER_DATE, ENTRY_DATE, VISIT_ID, STATUS_ID, CREATED_BY, 
 FIRST_ATTEMPT_ORDER_ID, CURRENCY_UOM, SYNC_STATUS_ID, BILLING_ACCOUNT_ID, 
 ORIGIN_FACILITY_ID, WEB_SITE_ID, PRODUCT_STORE_ID, TERMINAL_ID, TRANSACTION_ID, 
 AUTO_ORDER_SHOPPING_LIST_ID, NEEDS_INVENTORY_ISSUANCE, IS_RUSH_ORDER, INTERNAL_CODE, 
 REMAINING_SUB_TOTAL, GRAND_TOTAL, LAST_UPDATED_STAMP, LAST_UPDATED_TX_STAMP, CREATED_STAMP, 
CREATED_TX_STAMP, RECIBIR_BODEGAL, RECEPCIONADA_BODEGAL, FECHA_RECEPCION_BODEGAL FROM 
ERP.ORDER_HEADER WHERE ((STATUS_ID = :v0 OR STATUS_ID = :v1 OR STATUS_ID = :v2) AND 
(ORDER_TYPE_ID = :v3)) ORDER BY ORDER_DATE DESC

is very slow. We've tested executing the query without the DISTINCT and it takes about 30 seconds. There are 4.000.000+ registers in the table. There are index for the PK field orderId and almost every other field

The EXPLAIN PLAN with DISTINCT is:

SELECT STATEMENT () (null)
 SORT (ORDER BY)    (null)
  HASH (UNIQUE) (null)
   TABLE ACCESS (FULL)  ORDER_HEADER

and without the DISTINCT is:

SELECT STATEMENT () (null)
 SORT (ORDER BY)    (null)
  TABLE ACCESS (FULL)   ORDER_HEADER

any ideas about tuning oracle to improve the performance of this kind of queries? It's very difficult to rewrite the query because is automatically generated by ofbiz so I think the solution is about tuning oracle

thanks in advance

EDIT: I analyzed the query using tkprof ,as suggested by Rob van Wijk and haffax,and the result is the following

********************************************************************************

SELECT DISTINCT ORDER_ID, ORDER_TYPE_ID, ORDER_NAME, EXTERNAL_ID,
 SALES_CHANNEL_ENUM_ID, ORDER_DATE, ENTRY_DATE, VISIT_ID, STATUS_ID, CREATED_BY, 
 FIRST_ATTEMPT_ORDER_ID, CURRENCY_UOM, SYNC_STATUS_ID, BILLING_ACCOUNT_ID, 
 ORIGIN_FACILITY_ID, WEB_SITE_ID, PRODUCT_STORE_ID, TERMINAL_ID, TRANSACTION_ID, 
 AUTO_ORDER_SHOPPING_LIST_ID, NEEDS_INVENTORY_ISSUANCE, IS_RUSH_ORDER, INTERNAL_CODE, 
 REMAINING_SUB_TOTAL, GRAND_TOTAL, LAST_UPDATED_STAMP, LAST_UPDATED_TX_STAMP, CREATED_STAMP, 
CREATED_TX_STAMP, RECIBIR_BODEGAL, RECEPCIONADA_BODEGAL, FECHA_RECEPCION_BODEGAL FROM 
ERP.ORDER_HEADER WHERE STATUS_ID = 'ORDER_COMPLETED' ORDER BY ORDER_DATE DESC

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.03       0.01          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      9.10     160.81      66729      65203         37          50
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      9.14     160.83      66729      65203         37          50

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 58  

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       1        0.00          0.00
  db file scattered read                       8178        0.28        146.55
  direct path write temp                       2200        0.04          4.22
  direct path read temp                          36        0.14          2.01
  SQL*Net more data to client                     3        0.00          0.00
  SQL*Net message from client                     1        3.36          3.36
********************************************************************************

So it seems the problem is the 'db file scattered read', any ideas to how to tune oracle in order to reduce the wait in this event?

Follow up with the new tkprof result, this time closing the session:

********************************************************************************

SELECT DISTINCT ORDER_ID, ORDER_TYPE_ID, ORDER_NAME, EXTERNAL_ID,
 SALES_CHANNEL_ENUM_ID, ORDER_DATE, ENTRY_DATE, VISIT_ID, STATUS_ID, CREATED_BY,
 FIRST_ATTEMPT_ORDER_ID, CURRENCY_UOM, SYNC_STATUS_ID, BILLING_ACCOUNT_ID,
 ORIGIN_FACILITY_ID, WEB_SITE_ID, PRODUCT_STORE_ID, TERMINAL_ID, TRANSACTION_ID,
 AUTO_ORDER_SHOPPING_LIST_ID, NEEDS_INVENTORY_ISSUANCE, IS_RUSH_ORDER, INTERNAL_CODE,
 REMAINING_SUB_TOTAL, GRAND_TOTAL, LAST_UPDATED_STAMP, LAST_UPDATED_TX_STAMP, CREATED_STAMP,
CREATED_TX_STAMP, RECIBIR_BODEGAL, RECEPCIONADA_BODEGAL, FECHA_RECEPCION_BODEGAL FROM
ERP.ORDER_HEADER WHERE STATUS_ID = 'ORDER_COMPLETED' ORDER BY ORDER_DATE DESC

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.03       0.01          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch        1      8.23      47.66      66576      65203         31          50
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      8.26      47.68      66576      65203         31          50

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 58 

Rows     Row Source Operation
-------  ---------------------------------------------------
     50  SORT ORDER BY (cr=65203 pr=66576 pw=75025 time=47666679 us)
3456659   TABLE ACCESS FULL ORDER_HEADER (cr=65203 pr=65188 pw=0 time=20757300 us)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       1        0.00          0.00
  db file scattered read                       8179        0.14         34.96
  direct path write temp                       2230        0.00          3.91
  direct path read temp                          52        0.14          0.84
  SQL*Net more data to client                     3        0.00          0.00
  SQL*Net message from client                     1     1510.62       1510.62
********************************************************************************

Upvotes: 4

Answers (6)

Sergey

Reputation: 31

Try to disable hash aggregation:

select /*+ no_use_hash_aggregation*/ distinct ...

http://oracle-randolf.blogspot.com/2011/01/hash-aggregation.html

Upvotes: 3

Rob van Wijk

Reputation: 17705

If the difference between the two queries is substantial, that would be surprising. You mention that the query without DISTINCT takes about 30 seconds. How much time does the query with the DISTINCT take?

Can you show the tkprof output of the query with the DISTINCT, after you traced the session with a "alter session set events '10046 trace name context forever, level 8'", and disconnect after the query has finished? This way we can see where time is actually being spent and if it was waiting for something ("direct path read temp" maybe?)

Regards, Rob.

Followup, after the tkprof file was posted:

I see you managed to get the tkprof output, but unfortunately you didn't disconnect your session before creating the tkprof file. Now, the cursor was left open and it failed to write STAT# lines to your trace file. This is why you don't have a plan / row source operation in your tkprof file. It would be nice if you can repeat the process, if the suggestion below turns out to be rubbish.

A little speculation from my side: I think the DISTINCT is almost a no-op because you are selecting so many columns. If this is true, then your predicate "WHERE STATUS_ID = 'ORDER_COMPLETED'" is very selective and you will benefit from having an index on this column. After you create the index, make sure you analyze it properly, maybe even with a histogram on if data values are skewed. The end result will be a different plan for this query, starting with an INDEX RANGE SCAN, followed by a TABLE ACCESS BY ROWID, leading to a very fast query.

After you have created an index, you can have the table re-analyzed, including histograms using this statement:

exec dbms_stats.gather_table_stats([owner],[table_name],cascade=>true,method_opt=>'FOR ALL INDEXED COLUMNS SIZE ')

Regards, Rob.

Upvotes: 1

dpbradley

Reputation: 11925

When troubleshooting applications where I don't have control of the SQL I find that the dbms_sqltune package saves a lot of time. See http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28419/d_sqltun.htm , and yes, unfortunately you should be licensed to use it.

There are procedures in this package to run a tuning analysis against a specific sql_id in the shared pool or the AWR repository. The analysis will contain indexing recommendations if there are any improvements to be made with additional indexes. More importantly, the analyzer might discover an improved access path that it can implement with what Oracle calls a SQL Profile - this is a set of hints that will be stored and used whenever this sql_id is executed. This happens without requiring the hints to be coded in the SQL statement, and there's also an option to do what you can think of as fuzzy matching if your application generates literal values in the statement instead of bind variables.

Of course this tool shouldn't be a substitute for understanding the application and it's data structures, but reading through the output that details the execution path of a better plan (if found) can be educational.

Upvotes: 0

Aurelio Martin Massoni

Reputation: 1706

Oracle is accessing the whole table each time you run the query ( TABLE ACCESS (FULL) ). Creating an INDEX on the STATUS_ID and ORDER_TYPE_ID columns

CREATE INDEX ERP.ORDER_HEADER_I1 ON ERP.ORDER_HEADER ( STATUS_ID, ORDER_TYPE_ID );

will help a lot, especially if there are several different values of STATUS_ID and ORDER_TYPE_ID in the ORDER_HEADER table.

Upvotes: -1

Tony Andrews

Reputation: 132630

Is ORDER_ID declared as the PK using a PRIMARY KEY constraint? Because if it is I would expect the optimiser to recognise that the DISTINCT is superfluous in this query and optimise it out. Without the constraint, it won't know it is superfluous and so will expend unnecessary and considerable effort in "de-duping" the results.

Upvotes: 1

haffax

Reputation: 6008

Since you're ordering results according to order_date it is important that you have a descending index on that field. Also tell oracle that you wish to use this index. Place the order_date field first in the query and use a hint like

SELECT /*+ index(HEADERS IDX_ORDER_DATE_DESC) */ ... 
FROM ERP.ORDER_HEADER HEADERS
WHERE ...
ORDER BY ORDER_DATE DESC

It is not so much about having indices, but rather about telling oracle to use them. Oracle is very picky about indices. You get the best results when you choose indices according to your most important queries. If in doubt, trace a query. This way you can see in what part of the query oracle spends the most time and whether your indices are actually picked up or not. Tracing is invaluable when fighting performance problems.

Upvotes: 2

How to improve performance in Oracle using SELECT DISTINCT

Answers (6)

Related Questions