auser
auser

Reputation: 43

How to speed up this numpy.arange loop?

In a python program, the following function is called about 20,000 times from another function that is called about 1000 times from yet another function that executes 30 times. Thus the total number of times this particular function is called is about 600,000,000. In python it takes more than two hours (perhaps much longer; I aborted the program without waiting for it to finish), while essentially the same task coded in Java takes less than 5 minutes. If I change the 20,000 above to 400 (keeping everything else in the rest of the program untouched), the total time drops to about 4 minutes (this means this particular function is the culprit). What can I do to speed up the Python version, or is it just not possible? No lists are manipulated inside this function (there are lists elsewhere in the whole program, but in those places I tried to use numpy arrays as far as possible). I understand that replacing python lists with numpy arrays speeds things up, but there are cases in my program (not in this particular function) where I must build a list iteratively, using append; and those must-have lists are lists of objects (not floats or ints), so numpy would be of little help even if I converted those lists of objects to numpy arrays.

def compute_something(arr):      
      '''
            arr is received as a numpy array of ints and floats (I think python upcasts them to all floats,
             doesn’t it?).
             Inside this function, elements of arr are accessed using indexing (arr[0], arr[1], etc.), because
             each element of the array has its own unique use. It’s not that I need the array as a whole (as in
             arr**2 or sum(arr)). 
             The arr elements are used in several simple arithmetic operations involving nothing costlier than
             +, -, *, /, and numpy.log(). There is no other loop inside this function; there are a few if’s though.
             Inside this function, use is made of constants imported from other modules (I doubt the
             importing, as in AnotherModule.x is expensive). 
      '''
      for x in numpy.arange(float1, float2, float3):
              do stuff
      return a, b, c       # Return a tuple of three floats

Edit: Thanks for all the comments. Here’s the inside of the function (I made the variable names short for convenience). The ndarray array arr has only 3 elements in it. Can you please suggest any improvement?

def compute_something(arr):     
    a = Mod.b * arr[1] * arr[2]  + Mod.c
    max = 0.0
    for c in np.arange(a, arr[1] * arr[2] * (Mod.d – Mod.e),  Mod.f):
              i = c / arr[2]
              m1 = Mod.A * np.log( (i / (arr[1] *Mod.d)) +  (Mod.d/Mod.e))
              m2 = -Mod.B * np.log(1.0 - (i/ (arr[1] *Mod.d)) - (Mod.d /  
                    Mod.e))      
              V = arr[0] * (Mod.E - Mod.r * i / arr[1] - Mod.r * Mod.d  -                      
                  m1 – m2)
              p = c * V /1000.0
              if p > max:
                   max = p
                   vmp = V

    pen =   Mod.COEFF1 * (Mod.COEFF2 - max) if max < Mod.CONST else 0.0
    wo = Mod.COEFF3 * arr[1] * arr[0] + Mod.COEFF4 * abs(Mod.R5 - vmp) + 
         Mod.COEFF6 * arr[2]
    w = wo + pen
    return vmp, max, w

Upvotes: 0

Views: 822

Answers (2)

Grzegorz Bokota
Grzegorz Bokota

Reputation: 1804

Python supports profiling of code. (module cProfile). Also there is option to use line_profiler to find most expensive part of code tool here. So you do not need to guessing which part of code is most expensive.

In this code which you presten the problem is in usage for loop which generates many conversion between types of objects. If you use numpy you can vectorize your calculation.

I try to rewrite your code to vectorize your operation. You do not provide information what is Mod object, but I have hope it will work.

def compute_something(arr):     
    a = Mod.b * arr[1] * arr[2]  + Mod.c
    # start calculation on vectors instead of for lop
    c_arr = np.arange(a, arr[1] * arr[2] * (Mod.d – Mod.e),  Mod.f)
    i_arr = c_arr/arr[2]
    m1_arr = Mod.A * np.log( (i_arr / (arr[1] *Mod.d)) +  (Mod.d/Mod.e))
    m2_arr = -Mod.B * np.log(1.0 - (i_arr/ (arr[1] *Mod.d)) - (Mod.d /  
        Mod.e))      
    V_arr = arr[0] * (Mod.E - Mod.r * i_arr / arr[1] - Mod.r * Mod.d  -                      
        m1_arr – m2_arr)
    p = c_arr * V_arr / 1000.0
    max_val = p.max()  # change name to avoid conflict with builtin function
    max_ind = np.nonzero(p == max_val)[0][0]
    vmp = V_arr[max_ind]

    pen =   Mod.COEFF1 * (Mod.COEFF2 - max_val) if max_val < Mod.CONST else 0.0
    wo = Mod.COEFF3 * arr[1] * arr[0] + Mod.COEFF4 * abs(Mod.R5 - vmp) + 
         Mod.COEFF6 * arr[2]
    w = wo + pen
    return vmp, max_val, w

Upvotes: 1

U13-Forward
U13-Forward

Reputation: 71610

I would suggest to use range as it is approximately 2 times faster:

def python():
    for i in range(100000):
        pass
def numpy():
    for i in np.arange(100000):
        pass
from timeit import timeit
print(timeit(python, number=1000))
print(timeit(numpy, number=1000))

Output:

5.59282787179696
10.027646953771665

Upvotes: 1

Related Questions