user2183200
user2183200

Reputation: 11

How to efficiently JOIN HUGE tables using SQL OUTER JOIN

Oracle 10g 64 bit Red Hat Enterprise Linux 5 64bit

I currently have access to a Normalized third party database. These have huge volumes of data and my requirement is to expose a Materialized VIEW by joining lot of tables.

Table 1: Example_Master Columns: MasterID (VARCHAR2(250)) MasterName (VARCHAR2(250)) Rows: 90 Million Primary Key: MasterID

Table 2: Example_ChildA1 Columns: ChildA1ID (VARCHAR2(250)) MasterID(VARCHAR2(250)) Rows: 25 Million

Table 3: Example_ChildA1ID Columns: ChildA1ID (VARCHAR2(250)) ChildA1Name(VARCHAR2(250)) Primary Key: ChildA1ID

Table 4: Example_ChildA2 Columns: ChildA2ID (VARCHAR2(250)) MasterID(VARCHAR2(250)) Rows: 35 Million

Table 5: Example_ChildA2ID Columns: ChildA2ID (VARCHAR2(250)) ChildA2Name(VARCHAR2(250)) Primary Key: ChildA2ID

etc.,

Each Child Table may or may not have an entry equivalent to the MASTERID. So I have to get all MASTERID's and its related CHILD Names. If there is no equivalent value for any of the Child then it has to return 'NULL'. So i have now below Materialized VIEW syntax

 SELECT a.MasterName, c.ChildA1Name, e.ChildA2Name, g.ChildA3Name
 FROM 
 Example_Master a,
 Example_ChildA1 b,
 Example_ChildA1ID c,
 Example_ChildA2 d,
 Example_ChildA2ID e,
 Example_ChildA3 f,
 Example_ChildA3ID g
 WHERE 
 c.ChildA1ID(+) = b.ChildA1ID
 AND e.ChildA2ID(+) = d.ChildA2ID
 AND g.ChildA3ID(+) = f.ChildA3ID
 AND a.MasterID=b.MasterID (+)
 AND a.MasterID=d.MasterID (+)
 AND a.MasterID=f.MasterID (+)

I have to join say 5 more child tables like above and the cost of this query has become so huge that it takes close to 16 minutes to get the results. Is there any better way to use OUTER JOINS? Please let me know if you need more details on my problem.

Thanks!

Upvotes: 1

Views: 8652

Answers (1)

David Aldridge
David Aldridge

Reputation: 52386

When you equi-join large tables like this the best plan you can hope for is a set of hash outer joins, the cost of which is generally equal to the cost of the full table scans plus overhead from large hash tables spilling to disk.

The cost of the full table scans can only really be reduced if there are covering indexes.

The likelihood of the hash tables spilling to disk is of course reduced by having a large amount of memory available, but can also be helped by equi-partitioning the tables on the join keys. Typically this is done with hash partitioning, using as high a number of hash partitions as is required to prevent keep the hash tables in memory. More from the docs: https://docs.oracle.com/cd/B28359_01/server.111/b32024/part_avail.htm#CIHCDBIF (or search for "partition-wise join")

Note that you don't need to use parallel query to benefit from partion-wise joins -- serial query also benefits.

By the way, be grateful that you're not on a sub-10g version -- you couldn't effectively equi-join with LARGE_TABLE LEFT OUTER JOIN SMALL TABLE as the optimiser couldn't access the small table first to create a hash table -- this would always be a nested loop :(

Upvotes: 5

Related Questions