rldw
rldw

Reputation: 145

Avoiding numpy loops while calculating intersections

I'd like to speed up the following calculations handling r rays and n spheres. Here is what I got so far:

# shape of mu1 and mu2 is (r, n)
# shape of rays is (r, 3)
# note that intersections has 2n columns because for every sphere one can
# get up to two intersections (secant, tangent, no intersection)
intersections = np.empty((r, 2*n, 3))
for col in range(n):
    intersections[:, col, :] = rays * mu1[:, col][:, np.newaxis]
    intersections[:, col + n, :] = rays * mu2[:, col][:, np.newaxis]

# [...]

# calculate euclidean distance from the center of gravity (0,0,0)
distances = np.empty((r, 2 * n))
for col in range(n):
    distances[:, col] = np.linalg.norm(intersections[:, col], axis=1)
    distances[:, col + n] = np.linalg.norm(intersections[:, col + n], axis=1)

I tried speeding things up by avoiding the for-Loops, but couldn't figure out how to broadcast the arrays properly so that I only need a single function call. Any help is much appreciated.

Upvotes: 2

Views: 105

Answers (1)

Divakar
Divakar

Reputation: 221504

Here's a vectorized way using broadcasting -

intersections = np.hstack((mu1,mu2))[...,None]*rays[:,None,:]
distances = np.sqrt((intersections**2).sum(2))

The last step could be replaced with an use of np.einsum like so -

distances = np.sqrt(np.einsum('ijk,ijk->ij',intersections,intersections))

Or replace almost the whole thing with np.einsum for another vectorized way, like so -

mu = np.hstack((mu1,mu2))
distances = np.sqrt(np.einsum('ij,ij,ik,ik->ij',mu,mu,rays,rays))

Runtime tests and verify outputs -

def original_app(mu1,mu2,rays):
    intersections = np.empty((r, 2*n, 3))
    for col in range(n):
        intersections[:, col, :] = rays * mu1[:, col][:, np.newaxis]
        intersections[:, col + n, :] = rays * mu2[:, col][:, np.newaxis]

    distances = np.empty((r, 2 * n))
    for col in range(n):
        distances[:, col] = np.linalg.norm(intersections[:, col], axis=1)
        distances[:, col + n] = np.linalg.norm(intersections[:, col + n], axis=1)
    return distances                    

def vectorized_app1(mu1,mu2,rays):
    intersections = np.hstack((mu1,mu2))[...,None]*rays[:,None,:]
    return np.sqrt((intersections**2).sum(2))

def vectorized_app2(mu1,mu2,rays):
    intersections = np.hstack((mu1,mu2))[...,None]*rays[:,None,:]
    return np.sqrt(np.einsum('ijk,ijk->ij',intersections,intersections))

def vectorized_app3(mu1,mu2,rays):
    mu = np.hstack((mu1,mu2))
    return np.sqrt(np.einsum('ij,ij,ik,ik->ij',mu,mu,rays,rays))

Timings -

In [101]: # Inputs
     ...: r = 1000
     ...: n = 1000
     ...: mu1 = np.random.rand(r, n)
     ...: mu2 = np.random.rand(r, n)
     ...: rays = np.random.rand(r, 3)


In [102]: np.allclose(original_app(mu1,mu2,rays),vectorized_app1(mu1,mu2,rays))
Out[102]: True

In [103]: np.allclose(original_app(mu1,mu2,rays),vectorized_app2(mu1,mu2,rays))
Out[103]: True

In [104]: np.allclose(original_app(mu1,mu2,rays),vectorized_app3(mu1,mu2,rays))
Out[104]: True

In [105]: %timeit original_app(mu1,mu2,rays)
     ...: %timeit vectorized_app1(mu1,mu2,rays)
     ...: %timeit vectorized_app2(mu1,mu2,rays)
     ...: %timeit vectorized_app3(mu1,mu2,rays)
     ...: 
1 loops, best of 3: 306 ms per loop
1 loops, best of 3: 215 ms per loop
10 loops, best of 3: 140 ms per loop
10 loops, best of 3: 136 ms per loop

Upvotes: 2

Related Questions