Reputation: 7549

Performance difference when put WHERE before or after JOIN

In Oracle, say if there is join between several large table, what is the performance difference between:

PLAN A:
SELECT A.a, B.b, C.c
FROM A join B on a.id = b.a
       join C on b.id = c.b
WHERE A.ax = '...' and B.bx = '...' and C.cx = '....';

PLAN B:
SELECT TA.a, TB.b, TC.c
FROM (
         SELECT A.a, A.id
         FROM A
         WHERE A.ax = '...'
     ) as TA
     join
     (
         SELECT B.b, B.a, B.id
         FROM B
         WHERE B.bx = '...'
     ) as TB on TA.id = TB.a
     join
     (
         SELECT C.c, C.b, 
         FROM C
         WHERE C.cx = '...'
     ) as TC on TB.id = TC.b;

PLAN A put condition after all table join together but PLAN B first generate each table's subset then join them together, can PLAN B perform better than PLAN A?

Upvotes: 2

Answers (4)

void

Reputation: 7890

I wanted to bring the execution plan of both above queries here and discuss on it but Pirate X explained it well so I'll not mention that and I'm totally agree with him but for me I experienced sometimes bringing the where clause in the join condition had better performance (in case of time) I mean maybe writing Plan A as following may be better, it has no cost you can test it:

SELECT A.a, B.b, C.c
FROM A join B on a.id = b.a and A.ax = '...'
       join C on b.id = c.b and B.bx = '...' and C.cx = '....';

in this way at least you have no where clause syntactically.

Upvotes: 0

Anton Shapovalov

Reputation: 9

First of all, plan depends on table's statistic, see dbms_stat. Right statistic ensure right query execution. For example DBMS_STATS.gather_schema_stats('SCOTT'); store statistic by schema see example

In the same time, there some options to improve plan:

do not use JOIN syntax, just "old" style "where". It force sql analyzer filter each table and then join. Hint /*+ leading() */ ensure tables will processed in right order:
```
SELECT /*+ leading(a b c) */ A.a, B.b, C.c
  FROM A, B, C 
 WHERE A.ax = '...' 
   and a.id = b.a 
   and B.bx = '...'
   and b.id = c.b 
   and c.cx = '....'; 
```

put filtering condition on each table just after "on" condition:

SELECT A.a, B.b, C.c
  FROM A join B on a.id = b.a and B.bx = '...' 
       join C on b.id = c.b and C.cx = '....'
 WHERE A.ax = '...';

If your tables really huge and you want to filter it, cache result and then join, you can use hint /*+ no_merge() */:

SELECT /*+ no_merge(TA) no_merge(TB) no_merge(TC) */ TA.a, TB.b, TC.c
  FROM (
  SELECT A.a, A.id
    FROM A
   WHERE A.ax = '...'
) as TA
join
(
 SELECT B.b, B.a, B.id
 FROM B
 WHERE B.bx = '...'
) as TB on TA.id = TB.a
join
(
 SELECT C.c, C.b, 
 FROM C
 WHERE C.cx = '...'
) as TC on TB.id = TC.b;

but is better to join 2 small tables, hash result, and then join third table:

SELECT /*+ no_merge(AB) */ AB.a, AB.b, TC.c
  FROM (
  SELECT /*+ use_hash(a b) */ A.a, A.id a_id, B.b, B.a b_a, B.id b_id
    FROM A, B
   WHERE A.ax = '...'
     AND a.id = b.a
     AND B.bx = '...'
 ) as AB,
 (
 SELECT C.c, C.b, 
   FROM C
  WHERE C.cx = '...'
 ) as TC 
 WHERE AB.b_id = TC.b;

Upvotes: 0

Pirate X

Reputation: 3093

Both of your queries depend on several factors like what is indexed and what is not, index strategy, server load, data caching, the way you've written query etc. I assume the columns in where clause are indexed while answering -

I ran both type of queries and found a fairly similar result. (This could probably be because of how my tables and columns are setup).

Explain plan for Plan A

SELECT STATEMENT  FIRST_ROWS
Cost: 9  Bytes: 446  Cardinality: 14

Explain plan for Plan B

SELECT STATEMENT  FIRST_ROWS
Cost: 12  Bytes: 448  Cardinality: 14

Now you see both took same amount of time but Ideally Plan A is what is considered under best practice (speaking from experience, I've seen Plan A queries being used almost everywhere while Plan B not so much)

Different queries work differently under different circumstances [and Oracle (or any SQL engine) smartly picks algorithm that would work best for you.]

EDIT - I checked with larger dataset now

Plan A and Plan B both are same

SELECT STATEMENT  FIRST_ROWS
    Cost: 35,413  Bytes: 1,888,512  Cardinality: 59,016

Upvotes: 1

Ross Bush

Reputation: 15175

Both queries will yield the almost identical query plans.

Plan A will spend more time performing Index Seeks(if using proper indexes) while spending less time performing table scans.

Plan B will spend more time performing Table Scans while less time performing Index Seeks (if using proper indexes).

Plan A is much more robust and compact.

There is not a straight forward answer and what I stated above is a cursory glance type of answer. The best way to determine the difference in your environment would be to test the outcome. The actual result is influenced on how tables are designed, how large the tables are, prior query executions, what has been cached, and a handful of other facets that are not brief enough to mention. The optimizer is usually pretty good at choosing the best plan.

Upvotes: 0

Performance difference when put WHERE before or after JOIN

Answers (4)

Related Questions