Reputation: 301
https://gist.github.com/anonymous/2463d5a8ee2849a6e1f5
Query 1 does not produce the expected results. However, queries 2 and 3 do. Why does moving the call to random() outside of the case statement matter?
Upvotes: 0
Views: 764
Reputation: 1270081
Consider the first expression:
select (case when round(random()*999999) + 1 between 000001 and 400000 then 1
when round(random()*999999) + 1 between 400001 and 999998 then 2
when round(random()*999999) + 1 between 999999 and 999999 then 3
else 4
end)
from generate_series(1, 8000000)
Presumably, you are thinking that the value "4" should almost never be selected. But, the problem is that random()
is being called separately for each when
clause.
So, the chance the it fails each clause is independent:
That means that about 24% of the time (60% * 40% * 99.9999%), the value "4" will appear. In actual fact, the first query returns "4" 23.98% of the time. To be honest, this is very close to the actual value, but given this size of data, but it is a bit further off than I would expect. However, it is close enough to explain what is happening.
Upvotes: 3