Matthew Baker
Matthew Baker

Reputation: 2729

Random values in iterative cte's

Using SQL Server 2016+

I have been having some difficulty in selecting random rows from a table which has been narrowed down to an issue with how random numbers are generated. For efficiency using a TOP 10 * ORDER BY NEWID() is to slow. In the code below I have used a set seed for repeatable results, but in live I would not.

Code sample:

SELECT  RAND(100) RN
,       RAND()
,       RAND()
,       RAND()
,       RAND();

SELECT  RAND(100) RN
UNION ALL SELECT  RAND()
UNION ALL SELECT  RAND()
UNION ALL SELECT  RAND()
UNION ALL SELECT  RAND();

WITH cte AS
    (SELECT 1         ID
     ,      RAND(100) RN
     UNION ALL
     SELECT cte.ID + 1
     ,      RAND()
       FROM cte
      WHERE ID < 5)
SELECT  RN
  FROM  cte;

Result Set

RN                                                                                          
---------------------- ---------------------- ---------------------- ---------------------- ----------------------
0.715436657367485      0.28463380767982       0.0131039082850364     0.28769876521071       0.100505471175005

(1 row affected)

RN
----------------------
0.715436657367485
0.28463380767982
0.0131039082850364
0.28769876521071
0.100505471175005

(5 rows affected)

RN
----------------------
0.715436657367485
0.28463380767982
0.28463380767982
0.28463380767982
0.28463380767982

(5 rows affected)

As you can see from the results when I call the RAND() function repeated in-line or via a union I get different results with each call. If however I call the function inside an iterative cte then I get the same value repeated.

This code is an example showing the problem, rather than the entire code set. I have created this purely as an example to demonstrate the problem. I have a solution based around a combination of Checksum & NewID() and the modulus calls and multiplication to give me values in the range I want, but this is fairly complex and seems excessive for just generating a list of random numbers in a range.

I'm looking for any guidance that can be offered on

  1. Why this is happening
  2. Any ways to solve the issue
  3. Other options for generating lists of random numbers (That aren't RBAR)

Many thanks.

Upvotes: 3

Views: 562

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270301

RAND() returns a "constant" value within a query. That is, it is evaluated once per "mention" in the query.

You can see this if you run:

select rand(), rand()
from (values (1), (2), (3)) v(x);

Each row has two values, which are different. However, the columns have the same values.

In any case, the canonical answer is to use RAND(CHECKSUM(NEWID())). This provides a random seed for RAND() on each invocation:

WITH cte AS
    (SELECT 1 as ID, RAND(CHECKSUM(NEWID())) as RN
     UNION ALL
     SELECT cte.ID + 1, RAND(CHECKSUM(NEWID())) as RN
     FROM cte
     WHERE ID < 5
    )
SELECT RN
FROM cte;

This is probably good enough to get a list of random values. I suspect that your real problem is different -- something like extracting random rows from a table. If that is the real problem, then ask a new question or check similar questions.

Upvotes: 1

Related Questions