Reputation: 16281
As far as I am aware, the only way to get a random value in a SELECT
statement is by using the newid()
function, as the random()
function doesn’t generate new values for each row.
This leads to the following awkward construction to get a random number from, say 0 - 9:
abs(checksum(newid())) % 10
If I use this expression in the SELECT
clause, it behaves as expected. However, if I try something like the following:
select *
from table
where abs(checksum(newid())) % 10>4;
I should have though that I would get roughly half the rows. Instead I get I get all or none of them. Apparently newid()
is only evaluated once, instead of for each row.
The question is, how can I use a random number in the WHERE
clause?
More
There is a similar question which asks for fixed number of rows at random. In the above example I could have used:
select top 50 percent from table order by newid();
which will get me what I am looking for.
The question remains, how can I use a random number in the WHERE
clause. For example, is it possible to do something like this?
select *
from table
where code={random number};
Upvotes: 3
Views: 1296
Reputation: 16281
The function newid()
is calculate only once in the WHERE
clause, not row by row. The trick is to force it to run row by row.
Of course it is possible to include it in a SELECT
clause, and, in turn, include that in a CTE or a subquery, as per the other answers.
Microsoft offer a solution here: https://learn.microsoft.com/en-us/previous-versions/sql/sql-server-2008-r2/ms189108(v=sql.105)?redirectedfrom=MSDN
The trick is to force newid()
to recalculate by combining it with some row value. This is easily done in the checksum()
function.
For example:
SELECT *
FROM table
WHERE abs(checksum(newid(),id)) % 10>4;
Upvotes: 1
Reputation: 28890
I should have though that I would get roughly half the rows. Instead I get I get all or none of them
You may get all of the rows or none of them ,since NEWID() is executed once per query when you use it in where clause..This is explained here by Conor Cunnigham and the technical term for this is called RumTimeConstants
You can look at your execution plan and look out for below expression
Const ConstValue
which you can see is calculated once and used throughout and finally you are doing just a boolean comparison,so you will end up with all rows or none
you have to use CTE Like the one stated in another answer or use Top with order by newid() or tablesample to return random rows
you may find Tablesample option more helpfull,since this may not go though all the table data to get only sample set of rows,unlike Newid()
below is one example on a table having 1000000 rows
select * from Orders
TABLESAMPLE (50 PERCENT)
plan
Upvotes: 0
Reputation: 93694
Here is one way to get around the problem
SELECT *
FROM (SELECT *,
Abs(Checksum(Newid())) % 10 AS ran
FROM yourtable) a
WHERE ran > 4;
for some reason newid()
in where
clause it is executed only once and it is checked with the constant.
When I check the execution plan your query is missing compute scalar
where as my query has compute scalar present in execution plan.
Upvotes: 1