Reputation: 13549
If you run the following sample code in SQL Server, you'll notice that newid() materializes after the join whereas row_number() materializes before the join. Does anyone understand this and if there's a way to work around it?
declare @a table ( num varchar(10) )
insert into @a values ('dan')
insert into @a values ('dan')
insert into @a values ('fran')
insert into @a values ('fran')
select *
from @a T
inner join
(select num, newid() id
from @a
group by num) T1 on T1.num = T.num
select *
from @a T
inner join
(select num, row_number() over (order by num) id
from @a
group by num) T1 on T1.num = T.num
Upvotes: 1
Views: 521
Reputation: 11
i had a similar problem and found out that the "inner join" was the problem. i was able to use "left joins" ...
Upvotes: 1
Reputation: 280262
Not sure I see what the problem is here. Materialize the subquery T1 first:
SELECT num, ROW_NUMBER() OVER (ORDER BY num)
FROM @a
GROUP BY num;
You get two rows:
dan 1
fran 2
Now join that against a on num = num, you get 4 rows, 2 for each distinct value. What is your actual goal here? Perhaps you should be applying ROW_NUMBER() outside?
The order of materialization is up to the optimizer. You'll find that other built-ins (RAND(), GETDATE() etc.) have similarly inconsistent materialization behavior. Not much you can do about it, and not much chance they're going to "fix" it.
EDIT
New code sample. Write the contents of @a to a #temp table to "materialize" the NEWID() assignment per unique num value.
SELECT num, id = NEWID()
INTO #foo FROM @a GROUP BY num;
SELECT a.num, f.id
FROM @a AS a
INNER JOIN #foo AS f
ON a.num = f.num;
DROP TABLE #foo;
Upvotes: 0