zarpendorff
zarpendorff

Reputation: 61

ORACLE Performance: (Hash) Join vs Subselect With Order By's in Huge Tables

I've got the following (abstract) problem in ORACLE 11g:

The overall picture looks like this:

UML

Now, I have to define a view showing rows of STRONG along with some additional columns linked by that 1:1 relationship. I tried two basic approaches:

Subselects

SELECT
(SELECT some_data1 FROM weak WHERE weak.pk = strong.pk AND weak.fk = strong.fk) some_data1,
(SELECT some_data2 FROM weak WHERE weak.pk = strong.pk AND weak.fk = strong.fk) some_data2
FROM strong

Left Outer Join

SELECT
weak.some_data1,
weak.some_data2
FROM strong
LEFT OUTER JOIN weak ON weak.pk = strong.pk AND weak.fk = strong.fk

I first thought that the "Left Outer Join"-way has to be better and I still think that this is true as long as there is no WHERE/ORDER_BY-clause. However, in the real world application, user query dialog inputs are dynamically translated into extensions of the above statements. Typically, the user knows the primary key of STRONG resulting in queries like this:

SELECT *
FROM the_view
WHERE the_view.pk LIKE '123%' --Or even the exact key
ORDER BY the_view.pk

Using the "Left Outer Join"-way, we encountered some very serious performance problems, even though most of these SELECTs only return a few rows. I think what happened is that the hash table did not fit into the memory resulting in way too many I/O-events. Thus, we went back to the Subselects.

Now, i have a few questions:

Q1

Does Oracle have to compute the entire hash table for every SELECT (with ORDER_BY)?

Q2

Why is the "Subselect"-way faster? Here, it might be worth noting that these columns can appear in the WHERE-clause as well.

Q3

Does it somehow matter that joining the two tables might potentially increase the number of selcted rows? If so: Can we somehow tell Oracle that this can never happen from a logical perspective?

Q4

In case that the "Left Outer Join"-Way is not a well-performing option: The "Subselect"-way does seem somewhat redundant. Is there a better way?

Thanks a lot!


EDIT

Due to request, I will add an explanation plan of the actual case. However, there are a few important things here:

The Execution Plan

----------------------------------------------------------------------------    ----------------------------------
| Id  | Operation                        | Name                         |     Rows  | Bytes |TempSpc| Cost (%CPU)|
---------------------------------------------------------------------------- ----------------------------------
|   0 | SELECT STATEMENT                 |                              |     717K|    73M|       | 13340   (2)|
|   1 |  HASH JOIN OUTER                 |                              |     717K|    73M|    66M| 13340   (2)|
|   2 |   VIEW                           |                              |   687K|    59M|       |     5   (0)|
|   3 |    NESTED LOOPS OUTER            |                              |   687K|    94M|       |     5   (0)|
|   4 |     NESTED LOOPS OUTER           |                              |     1 |   118 |       |     4   (0)|
|   5 |      TABLE ACCESS BY INDEX ROWID | Z                            |     1 |   103 |       |     3   (0)|
|   6 |       INDEX UNIQUE SCAN          | SYS_C00245876                |     1 |       |       |     2   (0)|
|   7 |      INDEX UNIQUE SCAN           | SYS_C00245876                |  1798K|    25M|       |     1   (0)|
|   8 |     VIEW PUSHED PREDICATE        | TV                           |   687K|    17M|       |     1   (0)|
|   9 |      NESTED LOOPS OUTER          |                              |     1 |    67 |       |     2   (0)|
|  10 |       TABLE ACCESS BY INDEX ROWID| T                            |     1 |    48 |       |     2   (0)|
|  11 |        INDEX UNIQUE SCAN         | SYS_C00245609                |     1 |       |       |     1   (0)|
|  12 |       INDEX UNIQUE SCAN          | SYS_C00254613                |     1 |    19 |       |     0   (0)|
|  13 |   TABLE ACCESS FULL              | CCPP                         |  5165K|    88M|       |  4105   (3)|
--------------------------------------------------------------------------------------------------------------

Upvotes: 1

Views: 894

Answers (1)

Marmite Bomber
Marmite Bomber

Reputation: 21063

The real question is - how many records does your query return?

10 records only or 10.000 (or 10M) and you expect to see the first 10 rows quickly?

For the letter case the subquery solution works indeed better as you need no sort and you lookup the WEAK table only small number of times.

For the former case (i.e. the number of selected rows in both table is small) I'd expect execution plan as follows:

 --------------------------------------------------------------------------------------------
 | Id  | Operation                     | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
 --------------------------------------------------------------------------------------------
 |   0 | SELECT STATEMENT              |            |     4 |   336 |   100   (1)| 00:00:02 |
 |   1 |  SORT ORDER BY                |            |     4 |   336 |   100   (1)| 00:00:02 |
 |   2 |   NESTED LOOPS OUTER          |            |     4 |   336 |    99   (0)| 00:00:02 |
 |   3 |    TABLE ACCESS BY INDEX ROWID| STRONG     |     4 |   168 |    94   (0)| 00:00:02 |
 |*  4 |     INDEX RANGE SCAN          | STRONG_IDX |   997 |       |     4   (0)| 00:00:01 |
 |   5 |    TABLE ACCESS BY INDEX ROWID| WEAK       |     1 |    42 |     2   (0)| 00:00:01 |
 |*  6 |     INDEX UNIQUE SCAN         | WEAK_IDX   |     1 |       |     1   (0)| 00:00:01 |
 --------------------------------------------------------------------------------------------
 Predicate Information (identified by operation id):
 ---------------------------------------------------

   4 - access("STRONG"."PK" LIKE '1234%')
        filter("STRONG"."PK" LIKE '1234%')
    6 - access("WEAK"."PK"(+)="STRONG"."PK" AND "WEAK"."FK"(+)="STRONG"."FK")
        filter("WEAK"."PK"(+) LIKE '1234%')

If you see FULL TABLE SCAN on one or other table - the optimizes impression could be that the predicate pk LIKE '123%' will return too much records and the index access will be slower. This could be a good or bad guess, so you may need to check your table statistics and cardinality estimation.

Some additional information follows

Q1

If Oracle performs a HASH JOIN the whole data source (typically the smaller one) must be read in memory in the hash table. This is the whole table or the part of it as filtered by the WHERE/ON clause.
(In your case only records with pk LIKE '123%' )

Q2

This may be only an impression, as you see quickly first records. The subquery is performed only for the first few fetched rows.

To know the exact answer you must examine (or post) the execution plans.

Q3

No, sorry, joining of the two tables NEVER potentially increase the number of selcted rows but returns exact the number of rows as defined in the SQL standard.

It is your responsibility to define the join on a unique / primary key to avoid duplication.

Q4

You may of course select something like some_data1 ||'#'||some_data2 in the subquery, but it is in your responsibility to decide if it is safe..

Upvotes: 1

Related Questions