Reputation: 131
I have a list of times (called times in my code, produced by the code suggested to me in the thread astropy.io fits efficient element access of a large table) and I want to do some statistical tests for periodicity, using Zn^2 and epoch folding tests. Some steps in the code take quite a while to run, and I am wondering if there is a faster way to do it. I have tried the equivalent map and lambda functions, but that takes even longer. My list of times has several hundred or maybe thousands of elements, depending on the dataset. Here is my code:
phase=[(x-mintime)*testfreq[m]-int((x-mintime)*testfreq[m]) for x in times]
# the above step takes 3 seconds for the dataset I am using for testing
# testfreq[m] is just one of several hundred frequencies I am testing
# times is of type numpy.ndarray
phasebin=[int(ph*numbins)for ph in phase]
# 1 second (numbins is 20)
powerarray=[phasebin.count(n) for n in range(0,numbins-1)]
# 0.3 seconds
poweravg=np.mean(powerarray)
chisq[m]=sum([(pow-poweravg)**2/poweravg for pow in powerarray])
# the above 2 steps are very quick
for n in range(0,maxn): # maxn is 3
cosparam=sum([(np.cos(2*np.pi*(n+1)*ph)) for ph in phase])
sinparam=sum([(np.sin(2*np.pi*(n+1)*ph)) for ph in phase])
# these steps each take 4 seconds
z2[m,n]=sum(z2[m,])+(cosparam**2+sinparam**2)/count
# this is quick (count is the number of times)
As this steps through several hundred frequencies on either side of frequencies identified through an FFT search, it takes a very long time to run. The same functionality in a lower level language runs much more quickly, but I need some of the Python modules for plotting, etc. I am hoping that Python can be persuaded to do some of the operations, particularly the phase, phasebin, powerarray, cosparam, and sinparam calculations, significantly faster, but I am not sure how to make this happen. Can anyone tell me how this can be done, or do I have to write and call functions in C or fortran? I know that this could be done in a few minutes e.g. in fortran, but this Python code takes hours as it is.
Thanks very much.
Upvotes: 0
Views: 1346
Reputation: 23773
for
loop substitute - operating on complete arrays
First multiply phase
by 2*pi*n using broadcasting
phase = np.arange(10)
maxn = 3
ens = np.arange(1, maxn+1) # array([1, 2, 3])
two_pi_ens = 2*np.pi*ens
b = phase * two_pi_ens[:, np.newaxis]
b.shape
is (3,10) one row for each value of range(1, maxn)
Take the cosine then sum to get the three cosine parameters
c = np.cos(b)
c_param = c.sum(axis = 1) # c_param.shape is 3
Take the sine then sum to get the three sine parameters
s = np.sin(b)
s_param = s.sum(axis = 1) # s_param.shape is 3
Sum of the squares divided by count
d = (np.square(c_param) + np.square(s_param)) / count
# d.shape is (3,)
Assign to z2
for n in range(maxn):
z2[m,n] = z2[m,:].sum() + d[n]
That loop is doing a cumulative sum. numpy ndarrays have a cumsum
method.
If maxn
is small (3 in your case) it may not be noticeably faster.
z2[m,:] += d
z2[m,:].cumsum(out = z2[m,:])
To illustrate:
>>> a = np.ones((3,3))
>>> a
array([[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.]])
>>> m = 1
>>> d = (1,2,3)
>>> a[m,:] += d
>>> a
array([[ 1., 1., 1.],
[ 2., 3., 4.],
[ 1., 1., 1.]])
>>> a[m,:].cumsum(out = a[m,:])
array([ 2., 5., 9.])
>>> a
array([[ 1., 1., 1.],
[ 2., 5., 9.],
[ 1., 1., 1.]])
>>>
Upvotes: 1
Reputation: 117951
Instead of Python lists, you could use the numpy
library, it is much faster for linear algebra type operations. For example to add two arrays in an element-wise fashion
>>> import numpy as np
>>> a = np.array([1,2,3,4,5])
>>> b = np.array([2,3,4,5,6])
>>> a + b
array([ 3, 5, 7, 9, 11])
Similarly, you can multiply arrays by scalars which multiplies each element as you'd expect
>>> 2 * a
array([ 2, 4, 6, 8, 10])
As far as speed, here is the Python list equivalent of adding two lists
>>> c = [1,2,3,4,5]
>>> d = [2,3,4,5,6]
>>> [i+j for i,j in zip(c,d)]
[3, 5, 7, 9, 11]
Then timing the two
>>> from timeit import timeit
>>> setup = '''
import numpy as np
a = np.array([1,2,3,4,5])
b = np.array([2,3,4,5,6])'''
>>> timeit('a+b', setup)
0.521275608325351
>>> setup = '''
c = [1,2,3,4,5]
d = [2,3,4,5,6]'''
>>> timeit('[i+j for i,j in zip(c,d)]', setup)
1.2781205834379108
In this small example numpy
was more than twice as fast.
Upvotes: 5