SQL Multiple Inserts with RAND - how to change the RAND value

Question

I've got a statement that inserts values into a table based on a random value and another value

INSERT INTO TRAINER_SYNC (Steps, TrainerId, SyncDate)
SELECT AverageSteps * (RAND()*(1.15-0.85)+0.85), Id, GETDATE()
FROM TRAINER

This works fine, but whenever two "Trainers" have the same "AverageSteps", the number inserted into the table is always the same.

For example, if all the trainers have average steps of 10000, here is the result in the TRAINER_SYNC table

Id  TrainerId   SyncDate                    Steps
10  1           2018-10-20 18:42:16.407     9482
11  2           2018-10-20 18:42:16.407     9482
12  3           2018-10-20 18:42:16.407     9482
13  4           2018-10-20 18:42:16.407     9482
14  5           2018-10-20 18:42:16.407     9482
15  6           2018-10-20 18:42:16.407     9482
16  7           2018-10-20 18:42:16.407     9482
17  8           2018-10-20 18:42:16.407     9482
18  9           2018-10-20 18:42:16.407     9482

Is there a way to get the RAND() in the insert statement to be more... "random"? I'm not sure what it is called, but reset the initial seed so each iteration of the insert gives me a different random value?

Gordon Linoff · Accepted Answer

rand() is a special class of function that is evaluated exactly once for each call in the SQL statement. This can be really confusing, because:

select rand(), rand()

returns two different values. However,

select rand(), rand()
from (values (1), (2), (3)) v(x)

returns three rows -- with the same two values on each of the rows. This is an optimization "feature" of SQL Server. I believe that getdate() (and similar date/time functions) are the only other functions with this behavior.

The normal solution is to seed the random number generator with newid(). But, the return types are different, so checksum() is used to convert the id to a number:

INSERT INTO TRAINER_SYNC (Steps, TrainerId, SyncDate)
    SELECT AverageSteps * (RAND(CHECKSUM(NEWID()))*(1.15-0.85)+0.85), 
           Id, GETDATE()
    FROM TRAINER;

You could use a different seed, if one is available. For instance, if only one row is going in per trainer, you could use rand(id). I should note that rand() is not particularly "random" with seeds, so adjacent seeds often produce similar numbers.

You probably should not be setting GETDATE() in the INSERT. You can create the table with:

SyncDate datetime default getdate()

So the default value is the insertion time.

SQL Multiple Inserts with RAND - how to change the RAND value

Answers (2)

Related Questions