Optimizing an Euler-Maruyama implementation in python

Question

I was tasked to solve a specific Stochastic Differential Equation (SDE) and an ordinary DE that both depend on each other. I've been using the Euler-Maruyama method to solve the equations and currently need to solve 100 000 of these (they simulate particle paths and momentums). The code itself works fine, but the problem is that as I need to increase the amount of time steps for my algorithm, the calculation time naturally increases too. I originally chose python for the task as I'm most accustomed to it, although I was very aware of it's reputation as not being optimal for HPC stuff (slow loops). I'll probably have to change to using python as a "glue" with fortran subroutines at some point, but I was wondering if there's still some way to milk a little performance out of the current code.

I'm currently using the function EM to solve (simulate) the DE's, and then a helper function that collects all the simulated times, paths and momenta and appends them to their respective arrays. The code looks something like this:

def EM(tstart, tend, steps, x1, x2, x3, x4, x5):
    dt = float((tend - tstart))/steps     # Timestep
    T = np.linspace(tstart, tend, steps)
    y = [x3*T[0] - 0.01]    # Table for the paths
    p = [x3*x2*x4]  # Table for the momentum
    pos = 0.0
    mom = 0.0

    for i in range(1, steps):
        pos =  y[i-1] + dt*(((x1*y[i-1])/(x3*T[i-1])) + (3.0*p[i-1]*(T[0]/T[i-1])/x2)*((2*(x3*T[i-1]-y[i-1])/(y[i-1]+x5))-1)) + (np.sqrt(6*(p[i-1]*(T[0]/T[i-1])/x2)*(x3*T[i-1]-y[i-1])) * np.sqrt(dt) * np.random.normal(0,1))

        #Boundary condition
        if(pos > x3*T[i]):
            v = (pos-y[i-1])/dt
            tdot = (y[i-1]-v*T[i-1])/(x3-v)
            pos = x3*tdot - v*(T[i-1]+dt-tdot)

        mom = p[i-1] - (1.0/3.0)*p[i-1]*(x1/(x3*T[i-1]))*(1 + (2*y[i-1]/(y[i-1]+x5)))*dt

        y.append(pos)
        p.append(mom)

        #Boundary condition            
        if(pos < 0):
            break



    return T[0:i+1], y, p

Where x1,...,x5 are some constants. As of now, I need to use 108 000 time steps, and running the code with the built in %timeit test

%timeit EM(1.0, 10.0, 108000, 1.0, 1.0, 2.0, 3.0, 1.0)

gives me best case results between 65 ms to 25 ms.

The helper function I use to collect all of these together is rather simple:

def helper(tstart, tend, steps, x1, x2, x3, x4, x5, no_of):
    timespan = [] 
    momentums = []
    paths = []
    for i in range(0, no_of):
            t, y, p = EM(tstart, tend, steps, x1, x2, x3, x4, x5)
            timespan.append(t)
            paths.append(y)
            momentums.append(p)

    return timespan, paths, momentums

Running this through timeit with the following parameters

%timeit multi(1.0, 10.0, 108000, 1.0, 1.0, 2.0, 3.0, 1.0, 1000)

gives a best case result of 1 minute and 14 seconds (74 seconds), which with 100 000 particles would amount to 7400 seconds or roughly two hours. I can still work with this, but it's probable that I have to add simulations or time steps to this in the future.

I was originally using numpy arrays, but changing to regular python lists actually made the code faster. I'm guessing this is because you have to declare the size of numpy arrays before using them (unless you want to use the np.append method, but that's terribly slow in this situation) with the np.zeros() method. So although the amount of steps used is for example 108000, only a fraction of the simulations end up being that long, so I ended up needing to trim the zeros away from the arrays with np.trim_zeros().

I've been trying to use the Numba library and it's @jit method, but I can't get it to work. It gives me the following error:

    NotImplementedError: Failed at nopython (nopython frontend)
   (, build_list(items=[Var($0.20,  (5))]))

Could removing the helper function, and just running the code with the for-loop that appends the simulated arrays improve the run time? Is there a way to run the code without using for-loops and instead use array operations? I've hear that speeds things up quite a bit.

Any other thoughts? You help is much appreciated.

Optimizing an Euler-Maruyama implementation in python

Answers (1)

Related Questions