Charlie Crown
Charlie Crown

Reputation: 1089

Python: improving performance of a for loop with function call inside that depends on loop index only

I am guilty of writing code in python as if it were Fortran. I am rewriting many parts of a long code already written by myself in Fortran, because i want to extend the code significantly and it is MUCH easier to extend in python for proof of concept work. If it can be sped up enough however, I will simply use python I am not actually interested in turning the crank on the programs over and over. Once an idea is proven to work, I move on to the next problem. This is why I wish to work in python. Unfortunately, right now, it takes several weeks to run as written in python. Even an order of magnitude speedup on the following for loop would make it a feasible testing platform.

A similar question for the R language has been posed

Improving loop performance with function call inside

but surprisingly I don't see one for python. This one is similar but the function in the for loop has dependencies, which mine does not

Improve performance of a for loop in Python (possibly with numpy or numba)

A huge bottleneck is a single for loop

Simple Code

import numpy as np

part  = 3 # a random index of an array, fixed here for example purposes
nmol = 1000
energy = np.zeros((nmol),dtype=np.float_)

for i in range(nmol):
    energy[i] = np.where( part != i,function(part,i),0.0) # if i = part, energy = 0.0

speeding up the function itself is another and separate problem. There must be a way to use numpy or another method to run all calls simultaneously

For example purposes lets say

def function(i,j):
    for k in range(100000): # this loop is simply to make the time about a second or 2
        ener = (i + j) * (i * j) # entirely arbitrary and not my real problem
    return ener

In reality my function calls several function that depend on part and "i".

The full working example is:

import numpy as np
import time as time

def function(i,j):
    for k in range(10000): # this loop is simply to make the time about a second or 2
        ener = (i + j) * (i * j) # entirely arbitrary and not my real problem
    return ener

part  = 3 # a random index of an array, fixed here for example purposes
nmol = 1000
energy = np.zeros((nmol),dtype=np.float_)

start = time.time()
for i in range(nmol):
    energy[i] = np.where( part != i,function(part,i),0.0) # if i = part, energy = 0.0

end = time.time()
print('time: ', end-start)

I am using Python version 3.6. It is imperative that in the loop index "i" does not interact with the index "part".

Upvotes: 4

Views: 1532

Answers (2)

serge-sans-paille
serge-sans-paille

Reputation: 2139

based on your vectorized solution, you can get an extra speedup using pythran

Original code:

import numpy as np

def function(i, part):
    for k in range(10000):
        ener = (i + part) + (i * part)
    return ener

and associated benchmark:

python -m timeit -s 'import numpy as np; part  = 3; nmol = 1000; part_list = np.arange(0,nmol,1); part_list = np.delete(part_list, part); from a import function' 'function(part_list, part)'
10 loops, best of 3: 37.3 msec per loop

Then adding a pythran export comment

import numpy as np

#pythran export function(int64[], int64)
def function(i, part):
    for k in range(10000):
        ener = (i + part) + (i * part)
    return ener

And compiling the module with:

pythran a.py

Gives an extra boost:

python -m timeit -s 'import numpy as np; part  = 3; nmol = 1000; part_list = np.arange(0,nmol,1); part_list = np.delete(part_list, part); from a import function' 'function(part_list, part)'
1000000 loops, best of 3: 1.53 usec per loop

Upvotes: 1

Charlie Crown
Charlie Crown

Reputation: 1089

I am quite surprised that I am the one to figure out an answer - I am not intentionally answering my own question... I have just been thinking about it on my own for while now.

rather than a for loop from index 0 to nmol, make an integer array from 0 to nmol. simply call the function by passing the integer array. Thus an array input receives an array output. I modified the function so that it didn't require the constant "part"

This vectorized solution is ~27 times faster than the for loop, giving me the order of magnitude I required.

As the array length of size nmol gets bigger, the speed gains increase, and vice versa.

import numpy as np
import time as time

def function(i):
    for k in range(10000):
        ener = (i + part) + (i * part) # entirely arbitrary and not my real problem
    return ener

part  = 3 # a random index of an array, fixed here for example purposes
nmol = 1000

start = time.time()

part_list = np.arange(0,nmol,1)
part_list = np.delete(part_list,part)  # remove the self index

energy =function(part_list) # calls the function in a vectorized form.

end = time.time()
time2 = end-start

Upvotes: 1

Related Questions