Keith Gresham
Keith Gresham

Reputation: 153

SQL Server random number generator is not random

Can someone explain why the following code does not produce a even distribution of values from 1 to 10:

declare @tbl table (id int, nm int)

;with src(id) as (
select 1 union all select id+1 from src where id+1 <= 100000
)
insert @tbl(id, nm)
select id, ROUND(((9) * RAND(cast(newid() as varbinary)) + 1), 0)
from src
option (maxrecursion 0)

select nm, count(9)qty
from @tbl
group by nm

Example Output:

nm  qty
1   5523
2   11079
3   11190
4   11016
5   11026
6   11239
7   11149
8   11054
9   11243
10  5481

Notice that the quantity of 1's and 10's is about half the other numbers.

I have resorted to the following method to fix this flaw:

declare @tbl table (id int, nm int)

;with src(id) as (
select 1 union all select id+1 from src where id+1 <= 100000
)
insert @tbl(id, nm)
select id, (abs(checksum(newid()))%10)+1
from src
option (maxrecursion 0)

select nm, count(9)qty
from @tbl
group by nm

Example Output:

nm  qty
1   10053
2   10146
3   10123
4   9939
5   9804
6   9895
7   9887
8   9907
9   10193
10  10053

As you can see the numbers 1 and 10 equally represented. Does anyone know why the first method fails?

Keith

Solved (sort of): I figured out why the first method fails. The use of round is the culprit. The float value generated will only map to 1 or 10 if the value is between .5 and 1 or 9.5 and 10 respectfully. The other values have a double the range. For example 2's range is 1.5 to 2.5. Now how do you fix the first method? Or do we just avoid it? The first method is shown as "The" method for generating integer values using rand() at many websites. I will post a fix when I figure one out.

Cheers!

Upvotes: 4

Views: 148

Answers (1)

usr
usr

Reputation: 171206

Thanks to rounding only [1.0, 1.5) will lead to a 1. But [1.5, 2.5) will lead to a 2. That's twice as much of an interval length. And so on.

Don't use floats if you don't have to. Your second method is much better. The checksum(newid()) technique is what I consider best practice. (On a subjective note it's disappointing that we have to resort to such hacks to generate random integers.)

Upvotes: 3

Related Questions