High performance weighted random choice for python 2?

Question

I have the following python method, which selects a weighted random element from the sequence "seq" randomly weighted by other sequence, which contains the weights for each element in seq:

def weighted_choice(seq, weights):
    assert len(seq) == len(weights)

    total = sum(weights)
    r = random.uniform(0, total)
    upto = 0
    for i in range(len(seq)):
        if upto + weights[i] >= r:
            return seq[i]
        upto += weights[i]
    assert False, "Shouldn't get here"

If I call the above a million times with a 1000 element sequence, like this:

seq = range(1000)
weights = []
for i in range(1000):
    weights.append(random.randint(1,100))

st=time.time()
for i in range(1000000):
    r=weighted_choice(seq, weights)
print (time.time()-st)

it runs for approximately 45 seconds in cpython 2.7 and for 70 seconds in cpython 3.6. It finishes in around 2.3 seconds in pypy 5.10, which would be fine for me, sadly I can't use pypy for some reasons.

Any ideas on how to speed up this function on cpython? I'm interested in other implementations (algorithmically, or via external libraries, like numpy) as well if they perform better.

ps: python3 has random.choices with weights, it runs for around 23 seconds, which is better than the above function, but still exactly ten times slower than pypy can run.

I've tried it with numpy this way:

weights=[1./1000]*1000
st=time.time()
for i in range(1000000):
    #r=weighted_choice(seq, weights)
    #r=random.choices(seq, weights)
    r=numpy.random.choice(seq, p=weights)
print (time.time()-st)

It ran for 70 seconds.

DJK · Accepted Answer

you could take this approach with numpy. If you emlimiate the for loop, you can get the true power of numpy by indexing the positions you need

#Untimed since you did not
seq = np.arange(1000)
weights = np.random.randint(1,100,(1000,1))


def weights_numpy(seq,weights,iterations):
    """
    :param seq: Input sequence
    :param weights: Input Weights
    :param iterations: Iterations to run
    :return: 
    """
    r = np.random.uniform(0,weights.sum(0),(1,iterations)) #create array of choices
    ar = weights.cumsum(0) # get cumulative sum
    return seq[(ar >= r).argmax(0)] #get indeces of seq that meet your condition

And the timing (python 3,numpy '1.14.0')

%timeit weights_numpy(seq,weights,1000000)
4.05 s ± 256 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Which is a bit slower than PyPy, but hardly...

High performance weighted random choice for python 2?

Answers (2)

Related Questions