Why does newid() materialize at the very end of a query?

Question

If you run the following sample code in SQL Server, you'll notice that newid() materializes after the join whereas row_number() materializes before the join. Does anyone understand this and if there's a way to work around it?

declare @a table ( num varchar(10) )
 insert into @a values ('dan')
 insert into @a values ('dan')
 insert into @a values ('fran')
 insert into @a values ('fran')

 select *
   from @a T
            inner join
        (select num, newid() id
           from @a
          group by num) T1      on T1.num = T.num         

 select *
   from @a T
            inner join
        (select num, row_number() over (order by num) id
           from @a
          group by num) T1      on T1.num = T.num

Aaron Bertrand · Accepted Answer

Not sure I see what the problem is here. Materialize the subquery T1 first:

SELECT num, ROW_NUMBER() OVER (ORDER BY num) 
    FROM @a
    GROUP BY num;

You get two rows:

dan  1
fran 2

Now join that against a on num = num, you get 4 rows, 2 for each distinct value. What is your actual goal here? Perhaps you should be applying ROW_NUMBER() outside?

The order of materialization is up to the optimizer. You'll find that other built-ins (RAND(), GETDATE() etc.) have similarly inconsistent materialization behavior. Not much you can do about it, and not much chance they're going to "fix" it.

EDIT

New code sample. Write the contents of @a to a #temp table to "materialize" the NEWID() assignment per unique num value.

SELECT num, id = NEWID() 
  INTO #foo FROM @a GROUP BY num;

SELECT a.num, f.id 
  FROM @a AS a 
  INNER JOIN #foo AS f 
  ON a.num = f.num;

DROP TABLE #foo;

Why does newid() materialize at the very end of a query?

Answers (2)

Related Questions