Gabriel
Gabriel

Reputation: 42459

Improve performance of function without parallelization

Some weeks ago I posted a question (Speed up nested for loop with elements exponentiation) which got a very good answer by abarnert. This question is related to that one since it makes use of the performance improvements suggested by said user.

I need to improve the performance of a function that involves calculating three factors and then applying an exponential on them.

Here's a MWE of my code:

import numpy as np
import timeit

def random_data(N):
    # Generate some random data.
    return np.random.uniform(0., 10., N)

# Data lists.
array1 = np.array([random_data(4) for _ in range(1000)])
array2 = np.array([random_data(3) for _ in range(2000)])

# Function.
def func():
    # Empty list that holds all values obtained in for loop.    
    lst = []
    for elem in array1:
        # Avoid numeric errors if one of these values is 0.            
        e_1, e_2 = max(elem[0], 1e-10), max(elem[1], 1e-10)
        # Obtain three parameters.
        A = 1./(e_1*e_2)
        B = -0.5*((elem[2]-array2[:,0])/e_1)**2
        C = -0.5*((elem[3]-array2[:,1])/e_2)**2
        # Apply exponential.
        value = A*np.exp(B+C)
        # Store value in list.
        lst.append(value)

    return lst

# time function.
func_time = timeit.timeit(func, number=100)
print func_time

Is it possible to speed up func without having to recurr to parallelization?

Upvotes: 1

Views: 143

Answers (1)

Claudiu
Claudiu

Reputation: 229581

Here's what I have so far. My approach is to do as much of the math as possible across numpy arrays.

Optimizations:

  • Calculate As within numpy
  • Re-factor calculation of B and C by splitting them into factors, some of which can be computed within numpy

Code:

def optfunc():
    e0 = array1[:, 0]
    e1 = array1[:, 1]
    e2 = array1[:, 2]
    e3 = array1[:, 3]

    ar0 = array2[:, 0]
    ar1 = array2[:, 1]

    As = 1./(e0 * e1)
    Bfactors = -0.5 * (1 / e0**2)
    Cfactors = -0.5 * (1 / e1**2)

    lst = []
    for i, elem in enumerate(array1):
        B = ((elem[2] - ar0) ** 2) * Bfactors[i]
        C = ((elem[3] - ar1) ** 2) * Cfactors[i]

        value = As[i]*np.exp(B+C)

        lst.append(value)

    return lst

print np.allclose(optfunc(), func())

# time function.
func_time = timeit.timeit(func, number=10)
opt_func_time = timeit.timeit(optfunc, number=10)
print "%.3fs --> %.3fs" % (func_time, opt_func_time)

Result:

True
0.759s --> 0.485s

At this point I'm stuck. I managed to do it entirely without python for loops, but it is slower than the above version for a reason I do not yet understand:

def optfunc():
    x = array1
    y = array2

    x0 = x[:, 0]
    x1 = x[:, 1]
    x2 = x[:, 2]
    x3 = x[:, 3]

    y0 = y[:, 0]
    y1 = y[:, 1]

    A = 1./(x0 * x1)
    Bfactors = -0.5 * (1 / x0**2)
    Cfactors = -0.5 * (1 / x1**2)

    B = (np.transpose([x2]) - y0)**2 * np.transpose([Bfactors])
    C = (np.transpose([x3]) - y1)**2 * np.transpose([Cfactors])

    return np.transpose([A]) * np.exp(B + C)

Result:

True
0.780s --> 0.558s

However note that the latter gets you an np.array whereas the former only gets you a Python list... this might account for the difference but I'm not sure.

Upvotes: 4

Related Questions