Reputation: 1611
Background. I'm using SQL Server. I have two tables in database:
Vendors(Id, Name, Description)
Products(Id, VendorId, Name, Description)
Values in Id
column are formatted with prefix 'ID_'
in Vendor
table.
Values in VendorId
column are formatted with prefix 'VE_'
in Products
table.
E.g. 'VE_001245'
in Products
refers to 'ID_001245'
in Vendors
.
(Please, do not propose to change this concept, don't care about the database scheme, and don't suggest adding a foreign key. All of it is just for illustration.)
Question: which one of the following queries is best in the performance context and why?
Use replace
function in inner select
:
select v.* from Vendors v
inner join
(
select distinct replace(VendorId, 'VE_', 'ID_') as Id
from Products
) list
on v.Id = list.Id
Use replace
function in on
-statement:
select v.* from Vendors v
inner join
(
select distinct VendorId as Id
from Products
) list
on v.Id = replace(list.Id, 'VE_', 'ID_')
There is only a clustered index in each table (by Id
column). Each table can contain millions of rows.
Upvotes: 1
Views: 17454
Reputation: 3466
Both the queries are almost same in terms of performance. In the first query sorting is done twice, once when you are selecting the distinct records and again when it is performing an inner join, and in the end a merge join is there to select the final result set. Whereas in second query sorting is done only once but Hash join is being performed which is more expensive then merge join. So both the queries are same performance wise in the scenario when you don't have any index on the table.
Upvotes: 0