Reputation: 61
I've got the following (abstract) problem in ORACLE 11g
:
STRONG
and WEAK
.WEAK
is a compound consisting of the primary key of STRONG
(called pk
) plus an additional column (called fk
).1:N
relationship.N
WEAK
rows related to an entry of STRONG
that has a special relationship.1:1
relationship realised by adding the column fk
to STRONG
as well.The overall picture looks like this:
Now, I have to define a view showing rows of STRONG
along with some additional columns linked by that 1:1
relationship. I tried two basic approaches:
SELECT
(SELECT some_data1 FROM weak WHERE weak.pk = strong.pk AND weak.fk = strong.fk) some_data1,
(SELECT some_data2 FROM weak WHERE weak.pk = strong.pk AND weak.fk = strong.fk) some_data2
FROM strong
SELECT
weak.some_data1,
weak.some_data2
FROM strong
LEFT OUTER JOIN weak ON weak.pk = strong.pk AND weak.fk = strong.fk
I first thought that the "Left Outer Join"-way has to be better and I still think that this is true as long as there is no WHERE/ORDER_BY
-clause. However, in the real world application, user query dialog inputs are dynamically
translated into extensions of the above statements. Typically, the user knows the primary key of STRONG
resulting in queries like this:
SELECT *
FROM the_view
WHERE the_view.pk LIKE '123%' --Or even the exact key
ORDER BY the_view.pk
Using the "Left Outer Join"-way, we encountered some very serious performance problems, even though most of these SELECT
s only return a few rows. I think what happened is that the hash table did not fit into the
memory resulting in way too many I/O
-events. Thus, we went back to the Subselects.
Now, i have a few questions:
Does Oracle have to compute the entire hash table for every SELECT
(with ORDER_BY
)?
Why is the "Subselect"-way faster? Here, it might be worth noting that these columns can appear in the WHERE
-clause as well.
Does it somehow matter that joining the two tables might potentially increase the number of selcted rows? If so: Can we somehow tell Oracle that this can never happen from a logical perspective?
In case that the "Left Outer Join"-Way is not a well-performing option: The "Subselect"-way does seem somewhat redundant. Is there a better way?
Thanks a lot!
Due to request, I will add an explanation plan of the actual case. However, there are a few important things here:
STRONG => WEAK
-Join in a nested join (see below). The actual situation looks like this:
ZV
is the name of our target view - the explanation plan below refers to that view.Z
(~3M rows) join T
(~1M rows)T
joins CCPP
(~1M rows)TV
is a view based on T
. Come to think of it... this might be critical. The front end application sort of restricts us in the way we define these views: In ZV
, we have to join TV
instead of T
and we can not implement that T => CCPP
-join in TV
, forcing us to define the join TV => CCPP
as a nested join.---------------------------------------------------------------------------- ----------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)|
---------------------------------------------------------------------------- ----------------------------------
| 0 | SELECT STATEMENT | | 717K| 73M| | 13340 (2)|
| 1 | HASH JOIN OUTER | | 717K| 73M| 66M| 13340 (2)|
| 2 | VIEW | | 687K| 59M| | 5 (0)|
| 3 | NESTED LOOPS OUTER | | 687K| 94M| | 5 (0)|
| 4 | NESTED LOOPS OUTER | | 1 | 118 | | 4 (0)|
| 5 | TABLE ACCESS BY INDEX ROWID | Z | 1 | 103 | | 3 (0)|
| 6 | INDEX UNIQUE SCAN | SYS_C00245876 | 1 | | | 2 (0)|
| 7 | INDEX UNIQUE SCAN | SYS_C00245876 | 1798K| 25M| | 1 (0)|
| 8 | VIEW PUSHED PREDICATE | TV | 687K| 17M| | 1 (0)|
| 9 | NESTED LOOPS OUTER | | 1 | 67 | | 2 (0)|
| 10 | TABLE ACCESS BY INDEX ROWID| T | 1 | 48 | | 2 (0)|
| 11 | INDEX UNIQUE SCAN | SYS_C00245609 | 1 | | | 1 (0)|
| 12 | INDEX UNIQUE SCAN | SYS_C00254613 | 1 | 19 | | 0 (0)|
| 13 | TABLE ACCESS FULL | CCPP | 5165K| 88M| | 4105 (3)|
--------------------------------------------------------------------------------------------------------------
Upvotes: 1
Views: 894
Reputation: 21063
The real question is - how many records does your query return?
10 records only or 10.000 (or 10M) and you expect to see the first 10 rows quickly?
For the letter case the subquery solution works indeed better as you need no sort and you lookup the WEAK table only small number of times.
For the former case (i.e. the number of selected rows in both table is small) I'd expect execution plan as follows:
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 336 | 100 (1)| 00:00:02 |
| 1 | SORT ORDER BY | | 4 | 336 | 100 (1)| 00:00:02 |
| 2 | NESTED LOOPS OUTER | | 4 | 336 | 99 (0)| 00:00:02 |
| 3 | TABLE ACCESS BY INDEX ROWID| STRONG | 4 | 168 | 94 (0)| 00:00:02 |
|* 4 | INDEX RANGE SCAN | STRONG_IDX | 997 | | 4 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| WEAK | 1 | 42 | 2 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | WEAK_IDX | 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("STRONG"."PK" LIKE '1234%')
filter("STRONG"."PK" LIKE '1234%')
6 - access("WEAK"."PK"(+)="STRONG"."PK" AND "WEAK"."FK"(+)="STRONG"."FK")
filter("WEAK"."PK"(+) LIKE '1234%')
If you see FULL TABLE SCAN on one or other table - the optimizes impression could be that the predicate pk LIKE '123%' will return too much records and the index access will be slower. This could be a good or bad guess, so you may need to check your table statistics and cardinality estimation.
Some additional information follows
Q1
If Oracle performs a HASH JOIN the whole data source (typically the smaller one) must be read in memory
in the hash table. This is the whole table or the part of it as filtered by the WHERE/ON clause.
(In your case only records with pk LIKE '123%' )
Q2
This may be only an impression, as you see quickly first records. The subquery is performed only for the first few fetched rows.
To know the exact answer you must examine (or post) the execution plans.
Q3
No, sorry, joining of the two tables NEVER potentially increase the number of selcted rows but returns exact the number of rows as defined in the SQL standard.
It is your responsibility to define the join on a unique / primary key to avoid duplication.
Q4
You may of course select something like some_data1 ||'#'||some_data2
in the subquery, but it is in your responsibility
to decide if it is safe..
Upvotes: 1