Manngo
Manngo

Reputation: 16281

SQL Server: random number in WHERE clause

As far as I am aware, the only way to get a random value in a SELECT statement is by using the newid() function, as the random() function doesn’t generate new values for each row.

This leads to the following awkward construction to get a random number from, say 0 - 9:

abs(checksum(newid())) % 10

If I use this expression in the SELECT clause, it behaves as expected. However, if I try something like the following:

select *
from table
where abs(checksum(newid())) % 10>4;

I should have though that I would get roughly half the rows. Instead I get I get all or none of them. Apparently newid() is only evaluated once, instead of for each row.

The question is, how can I use a random number in the WHERE clause?

More

There is a similar question which asks for fixed number of rows at random. In the above example I could have used:

select top 50 percent from table order by newid();

which will get me what I am looking for.

The question remains, how can I use a random number in the WHERE clause. For example, is it possible to do something like this?

select *
from table
where code={random number};

Upvotes: 3

Views: 1296

Answers (3)

Manngo
Manngo

Reputation: 16281

The function newid() is calculate only once in the WHERE clause, not row by row. The trick is to force it to run row by row.

Of course it is possible to include it in a SELECT clause, and, in turn, include that in a CTE or a subquery, as per the other answers.

Microsoft offer a solution here: https://learn.microsoft.com/en-us/previous-versions/sql/sql-server-2008-r2/ms189108(v=sql.105)?redirectedfrom=MSDN

The trick is to force newid() to recalculate by combining it with some row value. This is easily done in the checksum() function.

For example:

SELECT *
FROM table
WHERE abs(checksum(newid(),id)) % 10>4;

Upvotes: 1

TheGameiswar
TheGameiswar

Reputation: 28890

I should have though that I would get roughly half the rows. Instead I get I get all or none of them

You may get all of the rows or none of them ,since NEWID() is executed once per query when you use it in where clause..This is explained here by Conor Cunnigham and the technical term for this is called RumTimeConstants

You can look at your execution plan and look out for below expression

Const ConstValue

which you can see is calculated once and used throughout and finally you are doing just a boolean comparison,so you will end up with all rows or none

you have to use CTE Like the one stated in another answer or use Top with order by newid() or tablesample to return random rows

you may find Tablesample option more helpfull,since this may not go though all the table data to get only sample set of rows,unlike Newid()

below is one example on a table having 1000000 rows

select *  from Orders
TABLESAMPLE (50 PERCENT)

plan

enter image description here

Upvotes: 0

Pரதீப்
Pரதீப்

Reputation: 93694

Here is one way to get around the problem

SELECT *
FROM   (SELECT *,
               Abs(Checksum(Newid())) % 10 AS ran
        FROM   yourtable) a
WHERE  ran > 4; 

for some reason newid() in where clause it is executed only once and it is checked with the constant.

When I check the execution plan your query is missing compute scalar where as my query has compute scalar present in execution plan.

Upvotes: 1

Related Questions