dragoncat16
dragoncat16

Reputation: 131

Is it possible to speed up an element-by-element array operation in Python?

I have a list of times (called times in my code, produced by the code suggested to me in the thread astropy.io fits efficient element access of a large table) and I want to do some statistical tests for periodicity, using Zn^2 and epoch folding tests. Some steps in the code take quite a while to run, and I am wondering if there is a faster way to do it. I have tried the equivalent map and lambda functions, but that takes even longer. My list of times has several hundred or maybe thousands of elements, depending on the dataset. Here is my code:

phase=[(x-mintime)*testfreq[m]-int((x-mintime)*testfreq[m]) for x in times]
# the above step takes 3 seconds for the dataset I am using for testing
# testfreq[m] is just one of several hundred frequencies I am testing
# times is of type numpy.ndarray

phasebin=[int(ph*numbins)for ph in phase]
# 1 second (numbins is 20)

powerarray=[phasebin.count(n) for n in range(0,numbins-1)]
# 0.3 seconds

poweravg=np.mean(powerarray)
chisq[m]=sum([(pow-poweravg)**2/poweravg for pow in powerarray])
# the above 2 steps are very quick


for n in range(0,maxn):  # maxn is 3
    cosparam=sum([(np.cos(2*np.pi*(n+1)*ph)) for ph in phase])
    sinparam=sum([(np.sin(2*np.pi*(n+1)*ph)) for ph in phase])
    # these steps each take 4 seconds

    z2[m,n]=sum(z2[m,])+(cosparam**2+sinparam**2)/count
    # this is quick (count is the number of times)

As this steps through several hundred frequencies on either side of frequencies identified through an FFT search, it takes a very long time to run. The same functionality in a lower level language runs much more quickly, but I need some of the Python modules for plotting, etc. I am hoping that Python can be persuaded to do some of the operations, particularly the phase, phasebin, powerarray, cosparam, and sinparam calculations, significantly faster, but I am not sure how to make this happen. Can anyone tell me how this can be done, or do I have to write and call functions in C or fortran? I know that this could be done in a few minutes e.g. in fortran, but this Python code takes hours as it is.

Thanks very much.

Upvotes: 0

Views: 1346

Answers (2)

wwii
wwii

Reputation: 23773

for loop substitute - operating on complete arrays

First multiply phase by 2*pi*n using broadcasting

phase = np.arange(10)
maxn = 3
ens = np.arange(1, maxn+1) # array([1, 2, 3])
two_pi_ens = 2*np.pi*ens
b = phase * two_pi_ens[:, np.newaxis]

b.shape is (3,10) one row for each value of range(1, maxn)

Take the cosine then sum to get the three cosine parameters

c = np.cos(b)
c_param = c.sum(axis = 1)   # c_param.shape is 3

Take the sine then sum to get the three sine parameters

s = np.sin(b)
s_param = s.sum(axis = 1)   # s_param.shape is 3

Sum of the squares divided by count

d = (np.square(c_param) + np.square(s_param)) / count
# d.shape is (3,)

Assign to z2

for n in range(maxn):
    z2[m,n] = z2[m,:].sum() + d[n]

That loop is doing a cumulative sum. numpy ndarrays have a cumsum method. If maxn is small (3 in your case) it may not be noticeably faster.

z2[m,:] += d
z2[m,:].cumsum(out = z2[m,:])

To illustrate:

>>> a = np.ones((3,3))
>>> a
array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])
>>> m = 1
>>> d = (1,2,3)
>>> a[m,:] += d
>>> a
array([[ 1.,  1.,  1.],
       [ 2.,  3.,  4.],
       [ 1.,  1.,  1.]])
>>> a[m,:].cumsum(out = a[m,:])
array([ 2.,  5.,  9.])
>>> a
array([[ 1.,  1.,  1.],
       [ 2.,  5.,  9.],
       [ 1.,  1.,  1.]])
>>> 

Upvotes: 1

Cory Kramer
Cory Kramer

Reputation: 117951

Instead of Python lists, you could use the numpy library, it is much faster for linear algebra type operations. For example to add two arrays in an element-wise fashion

>>> import numpy as np
>>> a = np.array([1,2,3,4,5])
>>> b = np.array([2,3,4,5,6])
>>> a + b
array([ 3,  5,  7,  9, 11])

Similarly, you can multiply arrays by scalars which multiplies each element as you'd expect

>>> 2 * a
array([ 2,  4,  6,  8, 10])

As far as speed, here is the Python list equivalent of adding two lists

>>> c = [1,2,3,4,5]
>>> d = [2,3,4,5,6]
>>> [i+j for i,j in zip(c,d)]
[3, 5, 7, 9, 11]

Then timing the two

>>> from timeit import timeit

>>> setup = '''
import numpy as np
a = np.array([1,2,3,4,5])
b = np.array([2,3,4,5,6])'''
>>> timeit('a+b', setup)
0.521275608325351

>>> setup = '''
c = [1,2,3,4,5]
d = [2,3,4,5,6]'''
>>> timeit('[i+j for i,j in zip(c,d)]', setup)
1.2781205834379108

In this small example numpy was more than twice as fast.

Upvotes: 5

Related Questions