Reputation: 33374
How do I select a random row from the database based on the probability chance assigned to each row.
Example:
Make Chance Value
ALFA ROMEO 0.0024 20000
AUDI 0.0338 35000
BMW 0.0376 40000
CHEVROLET 0.0087 15000
CITROEN 0.016 15000
........
How do I select random make name and its value based on the probability it has to be chosen.
Would a combination of rand()
and ORDER BY
work? If so what is the best way to do this?
Upvotes: 6
Views: 2042
Reputation: 2812
I had the same requirements and was trying to write a query for SQL Sever. My answer is base on @gordon-linoff's query of using cumulative sum for probabilities.
with rand AS (SELECT id, random=RAND() FROM Tiers)
, cumsum AS (SELECT id, cum_sum=SUM(probability) Over (Order By id) from Tiers)
Select TOP 1 t.id,
t.name,
t.probability
FROM Tiers t
inner join rand r on t.id = r.id
inner join cumsum c on t.id = c.id
WHERE c.cum_sum - r.random >= 0
ORDER BY c.cum_sum - r.random ASC
I have wrote a full summary to test this at https://github.com/AlahmadiQ8/cumulative-probability-sql
Here is a visual explanation of the cumulative sum probability
item | probability | cumulative |
---|---|---|
A | 0.2 | 0.2 |
B | 0.3 | 0.5 |
C | 0.5 | 1 |
If random number x = 0.45
, then most we should return item B
because x
is 0.2 < x <= 0.5
.
x
|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
0.2 0.5 1
└──────────┘ └───────────────┘ └───────────────────────────┘
A B C
Upvotes: 0
Reputation: 1270401
You can do this by using rand()
and then using a cumulative sum. Assuming they add up to 100%:
select t.*
from (select t.*, (@cumep := @cumep + chance) as cumep
from t cross join
(select @cumep := 0, @r := rand()) params
) t
where @r between cumep - chance and cumep
limit 1;
Notes:
rand()
is called once in a subquery to initialize a variable. Multiple calls to rand()
are not desirable.limit 1
arbitrarily chooses 1.cumep > @r
.Upvotes: 8