Peter
Peter

Reputation: 1196

implicit inner joins - are they equal?

In my opinion these two SELECTs are exactly equal (but I want to rewrite the first for the second to help the optimizer in my case) - whatever data sits in tables a,b,c,d both SELECTs will produce exactly the same results. Do you agree? Thanks!

create table a (id number);
create table b (id number);
create table c (id number);
create table d (id number);

--Q1
select * from a,b,c,d
where a.id = b.id
and   a.id = c.id 
and   a.id = d.id;

--Q2
select * from a,b,c,d
where a.id = b.id
and   a.id = c.id 
and   a.id = d.id
      -- Q2 differs from Q1 only in the next 3 lines
and   b.id = c.id
and   b.id = d.id
and   c.id = d.id;

Upvotes: 1

Views: 400

Answers (3)

Lukas Eder
Lukas Eder

Reputation: 220842

One would expect Oracle to apply the rules of transitive closure and use the information you've provided it to prove that the excess predicates are unnecessary (and it often does so, as I've shown in this blog post), but it doesn't seem to do it in this particular case (without constraints or indexes). Getting the execution plans for these two queries:

SELECT s.sql_id, p.*
FROM v$sql s, TABLE (
  dbms_xplan.display_cursor (
    s.sql_id, s.child_number, 'ALLSTATS LAST'
  )
) p
WHERE s.sql_text LIKE '%a,b,c,d%'

Yields:

SQL_ID  d54mttn9psd29, child number 0
-------------------------------------
--Q2
 select * from a,b,c,d
 where a.id = b.id
 and   a.id = c.id 
 and 
  a.id = d.id
       -- Q2 differs from Q1 only in the next 3 lines
 
and   b.id = c.id
 and   b.id = d.id
 and   c.id = d.id
 
Plan hash value: 3564259801
 
-------------------------------------------------------------------------
| Id  | Operation            | Name | E-Rows |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |        |       |       |          |
|*  1 |  HASH JOIN           |      |      1 |  1068K|  1068K|  240K (0)|
|*  2 |   HASH JOIN          |      |      1 |  1209K|  1209K|  505K (0)|
|*  3 |    HASH JOIN         |      |      1 |  1506K|  1506K|  461K (0)|
|   4 |     TABLE ACCESS FULL| A    |      1 |       |       |          |
|   5 |     TABLE ACCESS FULL| B    |      1 |       |       |          |
|   6 |    TABLE ACCESS FULL | C    |      1 |       |       |          |
|   7 |   TABLE ACCESS FULL  | D    |      1 |       |       |          |
-------------------------------------------------------------------------

SQL_ID  3zzwv0z5tq84f, child number 0
-------------------------------------
--Q1
 select * from a,b,c,d
 where a.id = b.id
 and   a.id = c.id 
 and 
  a.id = d.id
 
Plan hash value: 255250992
 
---------------------------------------------------------------------------
| Id  | Operation              | Name | E-Rows |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |        |       |       |          |
|*  1 |  HASH JOIN             |      |      1 |  1068K|  1068K|  178K (0)|
|   2 |   MERGE JOIN CARTESIAN |      |      1 |       |       |          |
|   3 |    MERGE JOIN CARTESIAN|      |      1 |       |       |          |
|   4 |     TABLE ACCESS FULL  | B    |      1 |       |       |          |
|   5 |     BUFFER SORT        |      |      1 | 73728 | 73728 |          |
|   6 |      TABLE ACCESS FULL | C    |      1 |       |       |          |
|   7 |    BUFFER SORT         |      |      1 | 73728 | 73728 |          |
|   8 |     TABLE ACCESS FULL  | D    |      1 |       |       |          |
|   9 |   TABLE ACCESS FULL    | A    |      1 |       |       |          |
---------------------------------------------------------------------------

Which is surprising. Without having calculated statistics on the empty tables, the E-Rows value is correct, but the memory estimates are way off, and different for the two queries.

This doesn't mean that one approach is always better or even different from the other, but it might be. E.g. when I add actual data to the tables and calculate statistics:

INSERT INTO a SELECT LEVEL FROM dual CONNECT BY LEVEL <= 10000;
INSERT INTO b SELECT LEVEL FROM dual CONNECT BY LEVEL <= 10000;
INSERT INTO c SELECT LEVEL FROM dual CONNECT BY LEVEL <= 10000;
INSERT INTO d SELECT LEVEL FROM dual CONNECT BY LEVEL <= 10000;

BEGIN
  DBMS_STATS.GATHER_TABLE_STATS('TEST', 'A');
  DBMS_STATS.GATHER_TABLE_STATS('TEST', 'B');
  DBMS_STATS.GATHER_TABLE_STATS('TEST', 'C');
  DBMS_STATS.GATHER_TABLE_STATS('TEST', 'D');
END;

Then, I'm getting the same execution plans with hash joins, not with merge joins:

SQL_ID  d54mttn9psd29, child number 0
-------------------------------------
--Q2
 select * from a,b,c,d
 where a.id = b.id
 and   a.id = c.id 
 and 
  a.id = d.id
       -- Q2 differs from Q1 only in the next 3 lines
 
and   b.id = c.id
 and   b.id = d.id
 and   c.id = d.id
 
Plan hash value: 2782485219
 
-------------------------------------------------------------------------
| Id  | Operation            | Name | E-Rows |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |        |       |       |          |
|*  1 |  HASH JOIN           |      |      1 |  1836K|  1836K| 2032K (0)|
|*  2 |   HASH JOIN          |      |      1 |  2161K|  2161K| 2145K (0)|
|   3 |    TABLE ACCESS FULL | C    |  10000 |       |       |          |
|*  4 |    HASH JOIN         |      |  10000 |  2546K|  2546K| 2211K (0)|
|   5 |     TABLE ACCESS FULL| A    |  10000 |       |       |          |
|   6 |     TABLE ACCESS FULL| B    |  10000 |       |       |          |
|   7 |   TABLE ACCESS FULL  | D    |  10000 |       |       |          |
-------------------------------------------------------------------------

SQL_ID  3zzwv0z5tq84f, child number 0
-------------------------------------
--Q1
 select * from a,b,c,d
 where a.id = b.id
 and   a.id = c.id 
 and 
  a.id = d.id
 
Plan hash value: 388154631
 
-------------------------------------------------------------------------
| Id  | Operation            | Name | E-Rows |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |        |       |       |          |
|*  1 |  HASH JOIN           |      |  10000 |  2546K|  2546K| 2099K (0)|
|   2 |   TABLE ACCESS FULL  | D    |  10000 |       |       |          |
|*  3 |   HASH JOIN          |      |  10000 |  2546K|  2546K| 2146K (0)|
|   4 |    TABLE ACCESS FULL | C    |  10000 |       |       |          |
|*  5 |    HASH JOIN         |      |  10000 |  2546K|  2546K| 2211K (0)|
|   6 |     TABLE ACCESS FULL| A    |  10000 |       |       |          |
|   7 |     TABLE ACCESS FULL| B    |  10000 |       |       |          |
-------------------------------------------------------------------------

The learning here is the same as always. Don't trust your intuition. Don't trust any old blog posts on the Internet. Check out execution plans and measure.

I used Oracle Database 18c Express Edition for this.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269703

I am going to address the question of whether those inequalities are always true. The answer is "no", not in SQL. Under most circumstances, they are equivalent. The problem arises with implicit type conversion.

In particular, if a.id is a number and other columns are strings, then you have the situation where:

1 = '1'        -- true
1 = '1.00'     -- true
'1' = '1.00'   -- false

You can see this on this db<>fiddle. It would be trivial to set this up using JOINs, but since I am not going to write code that has commas in the FROM clause, I'll leave that exercise to you.

In practice, id's used for joins should be of the same type. You cannot even declare a foreign key relationship if they are not. That best practice aside, the two queries are not automatically equivalent.

Note: This would be just as true if you used proper, explicit, standard JOIN syntax, which I strongly, strongly encourage you to learn and use exclusively.

Upvotes: 3

Rob Streeting
Rob Streeting

Reputation: 1735

Yes, results will be exactly the same. To think of it algebraically, if a = b and b = c, then a = c. This means that the final three conditions in the second query are redundant.

I would say it is better to stick with the original query though. At best, the query optimizer will need to perform more comparisons.

Upvotes: 1

Related Questions