hklel
hklel

Reputation: 1634

Vectorize this for loop in numpy

I am trying to compute matrix z (defined below) in python with numpy.

Here's my current solution (using 1 for loop)

z = np.zeros((n, k))
for i in range(n):
    v = pi * (1 / math.factorial(x[i])) * np.exp(-1 * lamb) * (lamb ** x[i])
    numerator = np.sum(v)
    c = v / numerator
    z[i, :] = c
return z

Is it possible to completely vectorize this computation? I need to do this computation for thousands of iterations, and matrix operations in numpy is much faster than huge for loops.

Upvotes: 0

Views: 92

Answers (1)

unutbu
unutbu

Reputation: 879581

Here is a vectorized version of E. It replaces the for-loop and scalar arithmetic with NumPy broadcasting and array-based arithmetic:

def alt_E(x):
    x = x[:, None]
    z = pi * (np.exp(-lamb) * (lamb**x)) / special.factorial(x)
    denom = z.sum(axis=1)[:, None]
    z /= denom
    return z

I ran em.py to get a sense for the typical size of x, lamb, pi, n and k. On data of this size, alt_E is about 120x faster than E:

In [32]: %timeit E(x)
100 loops, best of 3: 11.5 ms per loop

In [33]: %timeit alt_E(x)
10000 loops, best of 3: 94.7 µs per loop

In [34]: 11500/94.7
Out[34]: 121.43611404435057

This is the setup I used for the benchmark:

import math
import numpy as np
import scipy.special as special

def alt_E(x):
    x = x[:, None]
    z = pi * (np.exp(-lamb) * (lamb**x)) / special.factorial(x)
    denom = z.sum(axis=1)[:, None]
    z /= denom
    return z


def E(x):
    z = np.zeros((n, k))
    for i in range(n):
        v = pi * (1 / math.factorial(x[i])) * \
            np.exp(-1 * lamb) * (lamb ** x[i])
        numerator = np.sum(v)
        c = v / numerator
        z[i, :] = c
    return z

n = 576
k = 2

x = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5])

lamb = np.array([ 0.84835141, 1.04025989])
pi = np.array([ 0.5806958, 0.4193042])

assert np.allclose(alt_E(x), E(x))

By the way, E could also be calculated using scipy.stats.poisson:

import scipy.stats as stats
pois = stats.poisson(mu=lamb)

def alt_E2(x):
    z = pi * pois.pmf(x[:,None])
    denom = z.sum(axis=1)[:, None]
    z /= denom
    return z

but this does not turn out to be faster, at least for arrays of this length:

In [33]: %timeit alt_E(x)
10000 loops, best of 3: 94.7 µs per loop

In [102]: %timeit alt_E2(x)
1000 loops, best of 3: 278 µs per loop

For larger x, alt_E2 is faster:

In [104]: x = np.random.random(10000)

In [106]: %timeit alt_E(x)
100 loops, best of 3: 2.18 ms per loop

In [105]: %timeit alt_E2(x)
1000 loops, best of 3: 643 µs per loop

Upvotes: 1

Related Questions