Reputation: 7809
I have an list ordered by some quality function from which I'd like to take elements, preferring the good elements at the beginning of the list.
Currently, my function to generate the random indices looks essentially as follows:
def pick():
p = 0.2
for i in itertools.count():
if random.random() < p:
break
return i
It does a good job, but I wonder:
Upvotes: 3
Views: 806
Reputation: 20080
You could simulate it via exponential, but this is like making square peg fit round hole. As Mark said, it is geometric distribution - discrete, shifted by 1. And it is right here in the numpy:
import numpy as np
import random
import itertools
import matplotlib.pyplot as plt
p = 0.2
def pick():
for i in itertools.count():
if random.random() < p:
break
return i
q = np.random.geometric(p, size = 100000) - 1
z = [pick() for i in range(100000)]
bins = np.linspace(-0.5, 30.5, 32)
plt.hist(q, bins, alpha=0.2, label='geom')
plt.hist(z, bins, alpha=0.2, label='pick')
plt.legend(loc='upper right')
plt.show()
Output:
Upvotes: 2
Reputation: 36608
What you are describing sounds a lot like the exponential distribution. It already exists in the random
module.
Here is some code that takes just the integer part of sampling from an exponential distribution with a rate parameter of 100.
import random
import matplotlib.pyplot as plt
d = [int(random.expovariate(1/100)) for i in range(10000)]
h,b = np.histogram(d, bins=np.arange(0,max(d)))
plt.bar(left=b[:-1], height=h, ec='none', width=1))
plt.show()
Upvotes: 3
Reputation: 336
random.random()
defaults to a uniform distribution, but there are other methods within random
that would also work. For your given use case, I would suggest random.expovariate(2)
(Documentation, Wikipedia). This is an exponential distribution that will heavily prefer lower values. If you google some of the other methods listed in the documentation, you can find some other built-in distributions.
Edit: Be sure to play around with the argument value for expovariate
. Also note that it doesn't guarantee a value less than 1, so you might need to ensure that you only use values less than 1.
Upvotes: 1