Reputation: 153
Can someone explain why the following code does not produce a even distribution of values from 1 to 10:
declare @tbl table (id int, nm int)
;with src(id) as (
select 1 union all select id+1 from src where id+1 <= 100000
)
insert @tbl(id, nm)
select id, ROUND(((9) * RAND(cast(newid() as varbinary)) + 1), 0)
from src
option (maxrecursion 0)
select nm, count(9)qty
from @tbl
group by nm
Example Output:
nm qty
1 5523
2 11079
3 11190
4 11016
5 11026
6 11239
7 11149
8 11054
9 11243
10 5481
Notice that the quantity of 1's and 10's is about half the other numbers.
I have resorted to the following method to fix this flaw:
declare @tbl table (id int, nm int)
;with src(id) as (
select 1 union all select id+1 from src where id+1 <= 100000
)
insert @tbl(id, nm)
select id, (abs(checksum(newid()))%10)+1
from src
option (maxrecursion 0)
select nm, count(9)qty
from @tbl
group by nm
Example Output:
nm qty
1 10053
2 10146
3 10123
4 9939
5 9804
6 9895
7 9887
8 9907
9 10193
10 10053
As you can see the numbers 1 and 10 equally represented. Does anyone know why the first method fails?
Keith
Solved (sort of): I figured out why the first method fails. The use of round is the culprit. The float value generated will only map to 1 or 10 if the value is between .5 and 1 or 9.5 and 10 respectfully. The other values have a double the range. For example 2's range is 1.5 to 2.5. Now how do you fix the first method? Or do we just avoid it? The first method is shown as "The" method for generating integer values using rand() at many websites. I will post a fix when I figure one out.
Cheers!
Upvotes: 4
Views: 148
Reputation: 171206
Thanks to rounding only [1.0, 1.5)
will lead to a 1. But [1.5, 2.5)
will lead to a 2. That's twice as much of an interval length. And so on.
Don't use floats if you don't have to. Your second method is much better. The checksum(newid())
technique is what I consider best practice. (On a subjective note it's disappointing that we have to resort to such hacks to generate random integers.)
Upvotes: 3