Reputation: 119
I'm running a basic query on a table with a unique ID in each table, which is a non-clustered index in each. One table has 8 million rows, and the other has 800,000 rows.
When I run the following, it returns 24,000 rows in less than a second:
select
a.[ID]
,b.[ID]
from
dbo.tbl_1 a
join
dbo.tbl_2 b
on
a.unique_id = b.unique_id
However, when I add an additional column in the join which will significantly reduce the recordset, this takes about 8 minutes
select
a.[ID]
,b.[ID]
from
dbo.tbl_1 a
join
dbo.tbl_2 b
on
a.unique_id = b.unique_id
AND a.code_letter = b.code_letter
The "code_letter" column is just one letter, and is set to varchar(1). I'm scratching my head trying to understand why this is hanging. The issue is that I've run a dynamic sql insert query with 20,000 permutations of joins, and it's taking way too long.
Edit
After trying this many ways, I realized that a simple select *
seems to work very efficiently, while selecting specific columns is the culprit. Here is the execution plan when I select *
:
Here is the execution plan when I select specific columns:
Again, my join is exactly the same, but the column selections are different.
Upvotes: 0
Views: 1273
Reputation: 1042
OP said, he didn't get the expected results, and based on his observations, I'll provide a different solution.
What I would do is, execute the following and get data to a temp table
select
a.[ID] as aID
,b.[ID] as bID
,a.code_letter as aCode_letter
,b.code_letter as bCode_letter
into #t
from
dbo.tbl_1 a
join
dbo.tbl_2 b
on
a.unique_id = b.unique_id
and then exec the following
Select aID, bID from #t Where aCode_letter = bCode_letter
select DISTINCT a.*
into #ta
from
dbo.tbl_1 a
join
dbo.tbl_2 b
on
a.unique_id = b.unique_id
select DISTINCT b.*
into #tb
from
dbo.tbl_1 a
join
dbo.tbl_2 b
on
a.unique_id = b.unique_id
And exec
Select a.ID, b.ID
from #ta a
Inner Join #tb b
on a.unique_id = b.unique_id and a.Code_letter = b.Code_letter
Upvotes: 1