Reputation: 11
I am trying to join 2 or more tables on MS SQL Server. All the tables have IsActive field which determines the active record. (IsActive = 1 means active record and IsActive = 0 means inactive record or record has been deleted from system)
So I have two conditions for joining the two or more tables.
On The first query,I filter the IsActive on the join clause
select * from table_A a
inner join table_B b
on a.ID = b.ID and b.IsActive = 1
inner join table_C c
on b.ID = c.ID and c.IsActive = 1
where a.IsActive = 1
On The second query, I also can filter IsActive on the where Clause
select * from table_A a
inner join table_B b
on a.ID = b.ID
inner join table_C c
on b.ID = c.ID
where a.IsActive = 1 and b.IsActive = 1
and c.IsActive = 1
notes: The relation from table A to B is one to one but from table A to C is one to many and also all the table has clustered index on primary key ID and the ID is auto increment.
So which one do you think is better? (assume each table has approximately 100.000 records (80% active records and 20% inactive records))
Thanks
Upvotes: 1
Views: 853
Reputation: 8816
The difference is simple but takes a careful eye to spot.
Consider the following example:
create table tbl_client as
select 1 as client_id, 'aaa' as client_name, 'Y' is_active from dual
union all
select 2, 'bbbbb', 'N' from dual
union all
select 3, 'cc', 'Y' from dual;
create table tbl_transaction as
select 1 transaction_id, 1 client_id, 123.34 amount from dual
union all
select 2, 1, 4353.45 from dual
union all
select 3, 2, 251.48 from dual;
Now, on these tables run the followoing queries:
Inner Join:
In an inner join, there is no difference in the results of the following two queries:
select c.client_name, t.amount, t.is_paid
from tbl_client c
inner join tbl_transaction t
on c.client_id = t.client_id
and t.is_paid = 'Y'; -- filter on join
select c.client_name, t.amount, t.is_paid
from tbl_client c
inner join tbl_transaction t
on c.client_id = t.client_id
where t.is_paid = 'Y'; -- filter in where
Both their result is the same as:
CLIENT_NAME AMOUNT IS_PAID
----------- ---------- -------
aaa 123.34 Y
aaa 4353.45 Y
Left Outer Join
This is where the difference kicks in.
Consider the following query:
select c.client_name, t.amount, t.is_paid
from tbl_client c
left outer join tbl_transaction t
on c.client_id = t.client_id
and t.is_paid = 'Y'; -- << filter in join
Result:
CLIENT_NAME AMOUNT IS_PAID
----------- ---------- -------
aaa 123.34 Y
aaa 4353.45 Y
cc -- << Note that client cc's transaction record is not there
bbbbb -- << and this client also shows up
And when you apply filter on where
in a left outer join:
select c.client_name, t.amount, t.is_paid
from tbl_client c
left outer join tbl_transaction t
on c.client_id = t.client_id
where t.is_paid = 'Y'; -- << filter in where
Result:
CLIENT_NAME AMOUNT IS_PAID
----------- ---------- -------
aaa 123.34 Y
aaa 4353.45 Y -- No row for bbbbb or cc clients, just like the inner join
Summary
In short, when you put a filter on the joining condition, the filter is applied to the table being joined. For example, in the first case in the left outer join section, the row for tbl_transaction
didn't show up for the client bbbbb
.
But when you put a filter in the where
clause, it filters the entire data set that is retrieved after joining all the tables (logically. Internal technical operations differ across RDBMSes). This is why the rows for bbbbb
and cc
didn't show up in the last query.
EDIT
As @DanGuzmanSqlServerMvp has mentioned in his comment, for the example you have posted in your question, the SQL Server query optimizer should execute the same plan. However, if there was an outer join in your query, the plans would be different.
Upvotes: 1