Reputation: 2401
I was tasked to solve a specific Stochastic Differential Equation (SDE) and an ordinary DE that both depend on each other. I've been using the Euler-Maruyama method to solve the equations and currently need to solve 100 000 of these (they simulate particle paths and momentums). The code itself works fine, but the problem is that as I need to increase the amount of time steps for my algorithm, the calculation time naturally increases too. I originally chose python for the task as I'm most accustomed to it, although I was very aware of it's reputation as not being optimal for HPC stuff (slow loops). I'll probably have to change to using python as a "glue" with fortran subroutines at some point, but I was wondering if there's still some way to milk a little performance out of the current code.
I'm currently using the function EM to solve (simulate) the DE's, and then a helper function that collects all the simulated times, paths and momenta and appends them to their respective arrays. The code looks something like this:
def EM(tstart, tend, steps, x1, x2, x3, x4, x5):
dt = float((tend - tstart))/steps # Timestep
T = np.linspace(tstart, tend, steps)
y = [x3*T[0] - 0.01] # Table for the paths
p = [x3*x2*x4] # Table for the momentum
pos = 0.0
mom = 0.0
for i in range(1, steps):
pos = y[i-1] + dt*(((x1*y[i-1])/(x3*T[i-1])) + (3.0*p[i-1]*(T[0]/T[i-1])/x2)*((2*(x3*T[i-1]-y[i-1])/(y[i-1]+x5))-1)) + (np.sqrt(6*(p[i-1]*(T[0]/T[i-1])/x2)*(x3*T[i-1]-y[i-1])) * np.sqrt(dt) * np.random.normal(0,1))
#Boundary condition
if(pos > x3*T[i]):
v = (pos-y[i-1])/dt
tdot = (y[i-1]-v*T[i-1])/(x3-v)
pos = x3*tdot - v*(T[i-1]+dt-tdot)
mom = p[i-1] - (1.0/3.0)*p[i-1]*(x1/(x3*T[i-1]))*(1 + (2*y[i-1]/(y[i-1]+x5)))*dt
y.append(pos)
p.append(mom)
#Boundary condition
if(pos < 0):
break
return T[0:i+1], y, p
Where x1,...,x5 are some constants. As of now, I need to use 108 000 time steps, and running the code with the built in %timeit test
%timeit EM(1.0, 10.0, 108000, 1.0, 1.0, 2.0, 3.0, 1.0)
gives me best case results between 65 ms to 25 ms.
The helper function I use to collect all of these together is rather simple:
def helper(tstart, tend, steps, x1, x2, x3, x4, x5, no_of):
timespan = []
momentums = []
paths = []
for i in range(0, no_of):
t, y, p = EM(tstart, tend, steps, x1, x2, x3, x4, x5)
timespan.append(t)
paths.append(y)
momentums.append(p)
return timespan, paths, momentums
Running this through timeit with the following parameters
%timeit multi(1.0, 10.0, 108000, 1.0, 1.0, 2.0, 3.0, 1.0, 1000)
gives a best case result of 1 minute and 14 seconds (74 seconds), which with 100 000 particles would amount to 7400 seconds or roughly two hours. I can still work with this, but it's probable that I have to add simulations or time steps to this in the future.
I was originally using numpy arrays, but changing to regular python lists actually made the code faster. I'm guessing this is because you have to declare the size of numpy arrays before using them (unless you want to use the np.append method, but that's terribly slow in this situation) with the np.zeros() method. So although the amount of steps used is for example 108000, only a fraction of the simulations end up being that long, so I ended up needing to trim the zeros away from the arrays with np.trim_zeros().
I've been trying to use the Numba library and it's @jit method, but I can't get it to work. It gives me the following error:
NotImplementedError: Failed at nopython (nopython frontend)
(<class 'numba.ir.Expr'>, build_list(items=[Var($0.20, <ipython-input- 32-4529989cafc0> (5))]))
Could removing the helper function, and just running the code with the for-loop that appends the simulated arrays improve the run time? Is there a way to run the code without using for-loops and instead use array operations? I've hear that speeds things up quite a bit.
Any other thoughts? You help is much appreciated.
Upvotes: 2
Views: 1060
Reputation:
Because of the iterative nature of this problem it's not possible to replace the loop with array operations.
As an alternative I think Numba is indeed a good choice. Numba doesn't work with Python lists (hence the exception you received), so then you're restricted to Numpy arrays. To handle the a priori unknown array size, the ndarray.resize
instance method is nice to use, because it frees the unused memory (as opposed to taking a slice, which keeps a reference to the whole array). The code would look something like this:
from numba import jit
@jit(nopython=True)
def EM_helper(T, x1, x2, x3, x5, y, p):
dt = (T[-1] - T[0]) / T.shape[0]
for i in range(1, T.shape[0]):
# ...big calculation
y[i] = pos
p[i] = mom
#Boundary condition
if(pos < 0):
break
return i+1
def EM(T, x1, x2, x3, x4, x5):
y = np.empty_like(T, dtype=float) # Table for the path
p = np.empty_like(T, dtype=float) # Table for the momentum
y[0] = x3*T[0] - 0.01
p[0] = x3*x2*x4
count = EM_helper(T, x1, x2, x3, x5, y, p)
y.resize(count)
p.resize(count)
return T[:count], y, p
Instead of the EM_helper
function you could also try to rely on automatic "loop lifting" but this is less robust in my experience.
The creation of the time array T = np.linspace(tstart, tend, steps)
I moved outside of the function, because in a quick test I found it would become a performance bottleneck.
Upvotes: 1