Reputation: 10680
According to this post, the correct way of generating a random value in SQL Server is:
ABS(CHECKSUM(NewId())) % 14 -- Returns a value between 0 and 13
However, when using this expression in a case statement, such as:
SELECT
CASE ABS(CHECKSUM(NEWID())) % 4
WHEN 0 THEN 'String A'
WHEN 1 THEN 'String B'
WHEN 2 THEN 'String C'
WHEN 3 THEN 'String D'
END AS RandomString -- Returns String A, B, C, D and NULLs.
FROM sys.all_objects
The outputtet RandomString column contains some NULLs, as can be seen in this SQL fiddle. I have found that I can wrap the randomizing expression in a CTE to avoid NULLs in the output, but I still wonder why the code above returns NULLs?
WITH RandomNumber AS (
SELECT ABS(CHECKSUM(NEWID())) % 4 AS N FROM sys.all_objects
)
SELECT TOP 100
CASE N
WHEN 0 THEN 'String A'
WHEN 1 THEN 'String B'
WHEN 2 THEN 'String C'
WHEN 3 THEN 'String D'
END AS RandomString -- Does not return any NULLs. Only String A, B, C and D.
FROM RandomNumber
I have tried to generate the random number using a slightly different method, but the result is the same:
CAST(RAND(CHECKSUM(NEWID())) * 4 AS INT) -- Returns a value between 0 and 3
This seems to be a problem on SQL Server 2014, I have not tested it on other versions.
Upvotes: 5
Views: 2346
Reputation: 2248
This is a well thought out discussion on random numbers and nulls. The best answer and explanation I've seen on this problem is from Aaron Bertrand's article, "Dirty Secrets of the CASE Expression," which is published at sqlperformance.com. In my situation, I combined Query D with CROSS APPLY
to get the results I was looking for.
Upvotes: 0
Reputation: 44326
The expression is being rewritten internally to the same logic as
CASE WHEN ABS(CHECKSUM(NEWID())) % 4 = 1 THEN x
WHEN ABS(CHECKSUM(NEWID())) % 4 = 2 THEN y
WHEN ...
This is why the syntax in the question will never work.
The expression is being evaluated for each WHEN
.
The way you can fix it is:
SELECT
CASE x % 4
WHEN 0 THEN 'String A'
WHEN 1 THEN 'String B'
WHEN 2 THEN 'String C'
WHEN 3 THEN 'String D'
END AS RandomString -- Returns String A, B, C, D and NULLs.
FROM sys.all_objects
CROSS APPLY (SELECT ABS(CHECKSUM(NEWID())) x) y
Using CROSS APPLY will only calculate it once
Upvotes: 6
Reputation: 239724
It generates NULL
s because there's no guarantee on how many times a particular expression will be evaluated.
What you wanted was for SQL Server to do something along the lines of:
let x = GenerateRandomNumber()
if x = 1 then 'String 1'
if x = 2 then 'String 2'
if x = 3 then 'String 3'
if x = 4 then 'String 4'
(Where GenerateRandomNumber()
is ABS(CHECKSUM(NEWID())) % 4
); but what SQL Server actually did is:
if GenerateRandomNumber() = 1 then 'String 1'
if GenerateRandomNumber() = 2 then 'String 2'
if GenerateRandomNumber() = 3 then 'String 3'
if GenerateRandomNumber() = 4 then 'String 4'
So you only get a non-NULL
result if you happen to pick the right random number for one particular comparison operation.
I think that even with the CTE, there's no guarantee that SQL Server won't ever generate something like the second code block above. If you want a stable, generated once random number, you need to arrange to store that value somewhere (e.g. in a table variable or temp table).
The reason why I focus on guarantees is that you don't want to end up writing code based on current observed behaviour. There were a lot of "issues" reported when SQL Server 2008 stopped "ordering" results in views that we're using the TOP 100 PERCENT ... ORDER BY
trick - something that happened to work (mostly) on 2005 and earlier versions but ceased to do so.
Similarly, if someone asked my for an expression that returns the number 5
, I could supply them the expression DATEPART(day,GETUTCDATE())
and let them run as many queryies over as many rows as they like - for the next 8 hours - but that doesn't mean that I'd recommend it as a solution to their problem.
And also, we know that there are areas where SQL Server's decisions about evaluation order can be surprising.
Upvotes: 6
Reputation: 6018
Well as far as why, I'm not really sure. But as far as your null problem, making it a derived table and querying that worked. Don't really know why.
SELECT CASE rand_num
WHEN 0 THEN 'String A'
WHEN 1 THEN 'String B'
WHEN 2 THEN 'String C'
WHEN 3 THEN 'String D'
END AS RandomString -- Returns String A, B, C, D and NULLs.
FROM
(
SELECT ABS(CHECKSUM(NEWID())) % 4 rand_num
FROM sys.all_objects
) A
WHERE rand_num BETWEEN 0 AND 3
Upvotes: 0