Mateusz
Mateusz

Reputation: 450

python loop faster than numpy array operations

I am working on a code for a model based on some Fourier transforms. Currently I am trying to optimize a part of it, so that it would be usable with some big amount of data. While trying to do that, I found a strange behavior, mainly a loop version of my code turn out to be faster than the same code written with numpy. Testing code is as follows:

# -*- coding: utf-8 -*-
import numpy as np
import timeit


def fourier_coef_loop(ts, times, k_max):
    coefs = np.zeros(k_max, dtype=float)
    t = 2.0 * np.pi * (times - times[0]) / times[-1]
    x = np.dot(np.arange(1,k_max+1)[np.newaxis].T,t[np.newaxis])
    for k in xrange(1, k_max + 1):
        cos_k = np.cos(x[k-1])
        coefs[k - 1] = (ts[-1] - ts[0]) + (ts[:-1] * (cos_k[:-1] - cos_k[1:])).sum()
    return(coefs)


def fourier_coef_np(ts, times, k_max):
    coefs = np.zeros(k_max, dtype=float)
    t = 2.0 * np.pi * (times - times[0]) / times[-1]
    x = np.dot(np.arange(1,k_max+1)[np.newaxis].T,t[np.newaxis])
    coefs = np.add(np.einsum('ij,j->i',np.diff(np.cos(x)), -ts[:-1]), (ts[-1] - ts[0]))
    return(coefs)


if __name__ == '__main__':
    iterations = 10
    size = 20000
    setup = "from __main__ import fourier_coef_loop, fourier_coef_np, size\n" \
            "import numpy as np"

#    arg = np.random.normal(size=size)
#    print(np.all(fourier_coef_np(arg, np.arange(size,dtype=float), size / 2) == fourier_coef_loop(arg, np.arange(size,dtype=float), size / 2)))

    time_loop = timeit.timeit("fourier_coef_loop(np.random.normal(size=size), np.arange(size,dtype=float), size / 2)",
                              setup=setup, number=iterations)
    print("With loop: {} s".format(time_loop))

    time_np = timeit.timeit("fourier_coef_np(np.random.normal(size=size), np.arange(size,dtype=float), size / 2)",
                            setup=setup, number=iterations)
    print("With numpy: {} s".format(time_np))

It gives following results:

With loop: 60.8385488987 s
With numpy: 64.9192998409 s

Can someone please tell me why is the loop version faster than the purely numpy version? I have totally ran out of ideas. I would also appreciate any suggestions on how to make this particular function faster.

Upvotes: 1

Views: 506

Answers (1)

As I noted in a comment, I don't think what you're timing is related to the difference between the loop and the vectorized version, your times even include the generation of 20000 normal pseudorandom numbers. You should try to put as much setup as possible outside the timing if you want to get a precise picture.

Anyway, your code has a few weird points, so here's my own suggestion:

def fourier_coef_new(ts,times,k_max):
    # no need to pre-allocate coefs, you're rebinding later
    t = 2.0 * np.pi * (times - times[0]) / times[-1]
    x = np.arange(1,k_max+1)[:,None] * t # make use of array broadcasting
    coefs = -np.dot(np.diff(np.cos(x)),ts[:-1]) + ts[-1]-ts[0] # einsum was dot
    return(coefs)

I tested this function for a given set of inputs, and it gave the same result as your functions. Note the [:,None] (or equivalent [:,np.newaxis]) way to introduce a singleton dimension into your array. Once you have an array of shape (N,1) and one of shape (M,) (latter compatible with (1,M)), their product will be (N,M) according to the array broadcasting rules of numpy.

I did a quick timing of the three functions with a given, pre-generated set of data, but 1. in python 3, 2. with size = 2000 and 3. using IPython's %timeit built-in. I'm not claiming that these results are any more reliable than yours, but I suspect that the above version should be fastest:

In [37]: %timeit fourier_coef_loop(ts,times,k_max)
1000 loops, best of 3: 1.09 ms per loop

In [38]: %timeit fourier_coef_np(ts,times,k_max)
1000 loops, best of 3: 1.06 ms per loop

In [39]: %timeit fourier_coef_new(ts,times,k_max)
1000 loops, best of 3: 1.05 ms per loop

As you can see, the numpy versions seem to be marginally faster. Since only a smaller part of your code is different between the two timed cases (and there are heavy trigonometric functions involved in both cases), this seems reasonable.

Upvotes: 1

Related Questions