nico.wagner
nico.wagner

Reputation: 125

How to use numpy instead of for loop with different vectors

I want to improve my code to make it faster and for now, I have a for loop that I don't know how to replace it with numpy functions.

import numpy as np

N = 1000000
d = 2000

p = np.linspace(0,210,211)
alpha = np.linspace(0.00000000000001, np.pi/2, N)
d1 = d*np.cos(alpha)

for i in range(len(p)):
    p1 = p[i]*np.cos(alpha)
    k = 1/((p[i]+d)*np.tan(alpha))
    z = np.exp(p1+d1)**k

First, I tried to vectorize the p1, d1 and k to a matrix with right sizes, but I don't know how to calculate the z without a loop. Furthermore, I think this is not an effective way.

import numpy as np

N = 1000000
d = 2000

p = np.linspace(0,210,211)
alpha = np.linspace(0.00000000000001, np.pi/2, N)
d1 = d*np.cos(alpha)


p1 = np.outer(np.cos(alpha),p)
d1 = np.matrix(d1).T * np.matrix(np.ones(len(p)))
k = 1/(np.outer(np.tan(alpha),p)+np.outer(np.tan(alpha),d))

Upvotes: 1

Views: 87

Answers (2)

Matthias Huschle
Matthias Huschle

Reputation: 711

While I think the answer by Pranav Hosangadi is correct and shows the modern way of using numpy, I doubt that it is the fastest, at least for the given parameters.

Looking at your original code, the flaws I see are:

  • using for i in range(len(p)) instead of for _p in p would be more pythonic, but will not affect performance.
  • calculating np.cos(alpha) and np.tan(alpha) in every iteration is not optimal. They should be calculated before the loop run.
  • You use a loop, that could be vectorized, but how bad is that exactly?

If you fix the second one, there are two main differences in the computing process when the for loop is eliminated:

  1. You lose the overhead of the for loop itself, which is a constant-time lookup in p, and some Python internals. For 211 iterations you are probably still in the lower region of microseconds. So not much to win.
  2. The objects to handle get larger. Pranav's answer scales them down, so this will not have a big effect for his solution. But your original parameters are different: 211 * 1000000 times the size of float means we are in the range of 1-2 GiB. This means during your calculations, your CPU won't have any cache hits and needs to load everything from RAM, which has higher latency than L1/2/3 caches. In the for loop case, the size might just fit, which gives a tremendous speed boost.

So I think it is dependent on the size of your arrays and on the machine you're on, whether full vectorisation is slower or faster. On my current machine, the following is 2.5 times faster than Pranav's solution:

cosalpha = np.cos(alpha)
tanalpha = np.tan(alpha)
z = np.zeros((p.size, alpha.size))
for i, _p in enumerate(p):
    p1 = _p*cosalpha
    k = 1/((_p+d)*tanalpha)
    z[i] = np.exp(p1+d1)**k

Upvotes: 2

pho
pho

Reputation: 25489

If you want one row per element in p, and one column per element in alpha, you just need to add an axis to p so it's a column vector. Numpy's broadcasting takes care of the rest:

import numpy as np

N = 100 # modified to run quickly
d = 20

# reshape p to a column vector
p = np.linspace(0,210,211).reshape((-1, 1))

alpha = np.linspace(0.00000000000001, np.pi/2, N)
d1 = d*np.cos(alpha)

p1 = p*np.cos(alpha)        # shape (211, 100)
k = 1/((p+d)*np.tan(alpha)) 
z = np.exp(p1+d1)**k

Try it online

Note that the power operation overflows to infinity, but that's not related to numpy.

Also note that while this answer does show you how to vectorize your operation, it might not make sense to do so since you're only saving a 211-iteration loop. It makes a lot more sense to vectorize larger loops, such as if p.size was much greater than alpha.size

Upvotes: 2

Related Questions