user582175
user582175

Reputation: 1244

High performance weighted random choice for python 2?

I have the following python method, which selects a weighted random element from the sequence "seq" randomly weighted by other sequence, which contains the weights for each element in seq:

def weighted_choice(seq, weights):
    assert len(seq) == len(weights)

    total = sum(weights)
    r = random.uniform(0, total)
    upto = 0
    for i in range(len(seq)):
        if upto + weights[i] >= r:
            return seq[i]
        upto += weights[i]
    assert False, "Shouldn't get here"

If I call the above a million times with a 1000 element sequence, like this:

seq = range(1000)
weights = []
for i in range(1000):
    weights.append(random.randint(1,100))

st=time.time()
for i in range(1000000):
    r=weighted_choice(seq, weights)
print (time.time()-st)

it runs for approximately 45 seconds in cpython 2.7 and for 70 seconds in cpython 3.6. It finishes in around 2.3 seconds in pypy 5.10, which would be fine for me, sadly I can't use pypy for some reasons.

Any ideas on how to speed up this function on cpython? I'm interested in other implementations (algorithmically, or via external libraries, like numpy) as well if they perform better.

ps: python3 has random.choices with weights, it runs for around 23 seconds, which is better than the above function, but still exactly ten times slower than pypy can run.

I've tried it with numpy this way:

weights=[1./1000]*1000
st=time.time()
for i in range(1000000):
    #r=weighted_choice(seq, weights)
    #r=random.choices(seq, weights)
    r=numpy.random.choice(seq, p=weights)
print (time.time()-st)

It ran for 70 seconds.

Upvotes: 0

Views: 701

Answers (2)

DJK
DJK

Reputation: 9274

you could take this approach with numpy. If you emlimiate the for loop, you can get the true power of numpy by indexing the positions you need

#Untimed since you did not
seq = np.arange(1000)
weights = np.random.randint(1,100,(1000,1))


def weights_numpy(seq,weights,iterations):
    """
    :param seq: Input sequence
    :param weights: Input Weights
    :param iterations: Iterations to run
    :return: 
    """
    r = np.random.uniform(0,weights.sum(0),(1,iterations)) #create array of choices
    ar = weights.cumsum(0) # get cumulative sum
    return seq[(ar >= r).argmax(0)] #get indeces of seq that meet your condition

And the timing (python 3,numpy '1.14.0')

%timeit weights_numpy(seq,weights,1000000)
4.05 s ± 256 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Which is a bit slower than PyPy, but hardly...

Upvotes: 0

FHTMitchell
FHTMitchell

Reputation: 12147

You can use numpy.random.choice (the p parameter is the weights). Normally numpy functions are vectorized and so run at near-C speed.

Implement as:

def weighted_choice(seq, weights):
    w = np.asarray(weights)
    p = w / w.sum()  # can skip if weights always sum to 1
    return np.random.choice(seq, p=w)

Edit:

Timings:

%timeit np.random.choice(x, p=w)  # len(x) == 1_000_000
13 ms ± 238 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit np.random.choice(y, p=w)  # len(y) == 100_000_000
1.28 s ± 18.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Upvotes: 2

Related Questions