Code_Hunter
Code_Hunter

Reputation: 11

which one is better:Filter on join clause or Filter on where clause when joining two table?

I am trying to join 2 or more tables on MS SQL Server. All the tables have IsActive field which determines the active record. (IsActive = 1 means active record and IsActive = 0 means inactive record or record has been deleted from system)

So I have two conditions for joining the two or more tables.

On The first query,I filter the IsActive on the join clause

select * from table_A a
inner join table_B b
on a.ID = b.ID and b.IsActive = 1
inner join table_C c
on b.ID = c.ID and c.IsActive = 1
where a.IsActive = 1

On The second query, I also can filter IsActive on the where Clause

select * from table_A a
inner join table_B b
on a.ID = b.ID 
inner join table_C c
on b.ID = c.ID 
where a.IsActive = 1 and b.IsActive = 1
and c.IsActive = 1

notes: The relation from table A to B is one to one but from table A to C is one to many and also all the table has clustered index on primary key ID and the ID is auto increment.

So which one do you think is better? (assume each table has approximately 100.000 records (80% active records and 20% inactive records))

Thanks

Upvotes: 1

Views: 853

Answers (1)

Rachcha
Rachcha

Reputation: 8816

The difference is simple but takes a careful eye to spot.

Consider the following example:

create table tbl_client as
    select 1 as client_id, 'aaa' as client_name, 'Y' is_active from dual
    union all
    select 2, 'bbbbb', 'N' from dual
    union all
    select 3, 'cc', 'Y' from dual;

create table tbl_transaction as
    select 1 transaction_id, 1 client_id, 123.34 amount from dual
    union all
    select 2, 1, 4353.45 from dual
    union all
    select 3, 2, 251.48 from dual;

Now, on these tables run the followoing queries:

Inner Join:

In an inner join, there is no difference in the results of the following two queries:

select c.client_name, t.amount, t.is_paid
  from tbl_client c
       inner join tbl_transaction t
           on     c.client_id = t.client_id
              and t.is_paid = 'Y'; -- filter on join

select c.client_name, t.amount, t.is_paid
  from tbl_client c
       inner join tbl_transaction t
           on     c.client_id = t.client_id
 where t.is_paid = 'Y'; -- filter in where

Both their result is the same as:

CLIENT_NAME     AMOUNT IS_PAID
----------- ---------- -------
aaa             123.34 Y      
aaa            4353.45 Y

Left Outer Join

This is where the difference kicks in.

Consider the following query:

select c.client_name, t.amount, t.is_paid
  from tbl_client c
       left outer join tbl_transaction t
           on     c.client_id = t.client_id
              and t.is_paid = 'Y'; -- << filter in join

Result:

CLIENT_NAME     AMOUNT IS_PAID
----------- ---------- -------
aaa             123.34 Y      
aaa            4353.45 Y      
cc                             -- << Note that client cc's transaction record is not there
bbbbb                          -- << and this client also shows up

And when you apply filter on where in a left outer join:

select c.client_name, t.amount, t.is_paid
  from tbl_client c
       left outer join tbl_transaction t
           on     c.client_id = t.client_id
 where t.is_paid = 'Y';       -- << filter in where

Result:

CLIENT_NAME     AMOUNT IS_PAID
----------- ---------- -------
aaa             123.34 Y      
aaa            4353.45 Y      -- No row for bbbbb or cc clients, just like the inner join

Summary

In short, when you put a filter on the joining condition, the filter is applied to the table being joined. For example, in the first case in the left outer join section, the row for tbl_transaction didn't show up for the client bbbbb.

But when you put a filter in the where clause, it filters the entire data set that is retrieved after joining all the tables (logically. Internal technical operations differ across RDBMSes). This is why the rows for bbbbb and cc didn't show up in the last query.

Fiddle

EDIT

As @DanGuzmanSqlServerMvp has mentioned in his comment, for the example you have posted in your question, the SQL Server query optimizer should execute the same plan. However, if there was an outer join in your query, the plans would be different.

Upvotes: 1

Related Questions