foobar0100
foobar0100

Reputation: 301

Why does calling random() inside of a case statement produce unexpected results?

https://gist.github.com/anonymous/2463d5a8ee2849a6e1f5

Query 1 does not produce the expected results. However, queries 2 and 3 do. Why does moving the call to random() outside of the case statement matter?

Upvotes: 0

Views: 764

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270081

Consider the first expression:

select (case when round(random()*999999) + 1 between 000001 and 400000 then 1
             when round(random()*999999) + 1 between 400001 and 999998 then 2
             when round(random()*999999) + 1 between 999999 and 999999 then 3
             else 4
        end)
from generate_series(1, 8000000)

Presumably, you are thinking that the value "4" should almost never be selected. But, the problem is that random() is being called separately for each when clause.

So, the chance the it fails each clause is independent:

  • About 60% of the time a random number will not match "1".
  • About 40% of the time a random number will not match "2".
  • About 99.9999% of the time a random number will not match "3" (I apologize if the number of nines is off, but the value is practically 1).

That means that about 24% of the time (60% * 40% * 99.9999%), the value "4" will appear. In actual fact, the first query returns "4" 23.98% of the time. To be honest, this is very close to the actual value, but given this size of data, but it is a bit further off than I would expect. However, it is close enough to explain what is happening.

Upvotes: 3

Related Questions