Reputation: 143
I am trying to calculate the sum of distances between a list of points using astropy. However my implementation is too slow to by implemented with my data, this is one example of my code:
import pandas as pd
import numpy as np
# synthetic data
lst2 = list(range(50))
lst2 = np.array(lst2)/50
lst3 = np.array(lst2)/51
df = pd.DataFrame(list(zip(lst2, lst3)),
columns =['A', 'B'])
# Sum of the distance between different points
def Sum(df_frame):
length = len(df_frame) #Size of "for" loops
Sum = 0
for i in range(length - 1):
for j in range(i+1,length):
c1 = SkyCoord(df_frame['A'].iloc[i]*u.deg, df_frame['A'].iloc[i]*u.deg, frame='icrs')
c2 = SkyCoord(df_frame['B'].iloc[j]*u.deg, df_frame['B'].iloc[j]*u.deg, frame='icrs')
angle = c1.separation(c2).deg
Sum += angle
return Sum
Sum(df)
Does anyone know how to increase the computational speed of this code?
My real data has millions of points.
Upvotes: 0
Views: 749
Reputation: 23306
This answer still going to be incredibly slow especially on a large dataframe, since you have a double-loop indexing the dataframe like df['A'].loc[i]
for every O(n^2) pair of elements.
I tried this with a dataframe containing just 1000 elements in each column and it took ages. For larger numbers I just gave up waiting. It speeds up considerably if you instead pass the columns to the function as normal numpy arrays, and then before performing the distance calculation also assign A_i = A[i]; B_j = B[j]
, i.e. like:
def my_sum2(A, B):
length = len(A) # Size of "for" loops
assert length == len(B)
Sum = 0
A = np.deg2rad(np.asarray(A))
B = np.deg2rad(np.asarray(B))
for i in range(length - 1):
for j in range(i + 1, length):
# print(a2, d2)
A_i = A[i]
B_j = B[j]
dist = np.rad2deg(
np.arccos(
np.sin(A_i) * np.sin(B_j) + \
np.cos(A_i) * np.cos(B_j) * \
np.cos(A_i - B_j)
)
)
Sum += dist
return Sum
For 100 elements I got:
>>> %timeit my_sum(df)
229 ms ± 3.06 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit my_sum2(df['A'], df['B'])
41.1 ms ± 2.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
But you can do much better still by pre-computing the sines and cosines using vectorized operations. This has a consequence of greater memory usage, at a tradeoff of speed (we could also build a matrix cos_A_B = np.cos(A[:, np.newaxis] - B)
for the cos(A[i] - B[j])
factor, but this would be prohibitively memory-exhaustive if A and B are large):
def my_sum3(A, B):
length = len(A) # Size of "for" loops
assert length == len(B)
Sum = 0
A = np.deg2rad(np.asarray(A))
B = np.deg2rad(np.asarray(B))
cos_A = np.cos(A)
sin_A = np.sin(A)
cos_B = np.cos(B)
sin_B = np.sin(B)
for i in range(length - 1):
for j in range(i + 1, length):
# print(a2, d2)
dist = np.rad2deg(
np.arccos(
sin_A[i] * sin_B[j] + \
cos_A[i] * cos_B[j] * \
np.cos(A[i] - B[j])
)
)
Sum += dist
return Sum
>>> %timeit my_sum3(df['A'], df['B'])
20.2 ms ± 715 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
But for pairwise calculations with NumPy arrays we can take further advantage of NumPy's element-wise broadcasting in order to eliminate the inner for-loop completely:
def my_sum4(A, B):
length = len(A) # Size of "for" loops
assert length == len(B)
Sum = 0
A = np.deg2rad(np.asarray(A))
B = np.deg2rad(np.asarray(B))
cos_A = np.cos(A)
sin_A = np.sin(A)
cos_B = np.cos(B)
sin_B = np.sin(B)
for i in range(length - 1):
Sum += np.sum(np.rad2deg(np.arccos(
sin_A[i] * sin_B[i + 1:] +
cos_A[i] * cos_B[i + 1:] *
np.cos(A[i] - B[i + 1:]))))
return Sum
>>> %timeit my_sum4(df['A'], df['B'])
1.31 ms ± 71.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
There are many many other ways this could be micro-optimized, using Cython, scipy, etc. but I won't spend any more time on it here.
The other problem with this approach is it's specifically geared to the detail of the OP's question where each coordinate has identical RA and DEC for some reason, and is not generalized.
Something Astropy beginners often miss about the SkyCoord
class (and many other classes in Astropy) is that a single SkyCoord
can be a container for an array of coordinates, not just a single coordinate.
In the OP's question they are creating millions of SkyCoord
objects, one for each coordinate. In fact you could simply do this:
>>> c1 = SkyCoord(df['A']*u.deg, df['A']*u.deg, frame='icrs')
>>> c2 = SkyCoord(df['B']*u.deg, df['B']*u.deg, frame='icrs')
Methods like SkyCoord.separation
also work element-wise just like other functions on NumPy arrays:
>>> c1.separation(c2)
<Angle [0.0130013 , 1.18683992, 0.82050812, ...] deg>
So for every pair-wise separation you could use a similar technique as in my my_sum4
solution, freeing you from having to write the calculation yourself:
def my_sum5(c1, c2):
angle_sum = 0
for idx in range(len(c1)):
angle_sum += c1[idx].separation(c2[idx + 1:]).sum()
return angle_sum
>>> my_sum5(c1, c2)
<Angle 2368.14558945 deg>
This is admittedly considerably slower than the last pure-NumPy solution:
>>> %timeit my_sum5(c1, c2)
166 ms ± 10.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
This overhead is the cost of some of Astropy's high-level interfaces, and I agree with MSH where they wrote in their answer:
You should know some times using ready products is faster since all the tools are available. However in some condition, as yours, using ready product makes you slower in execution time.
That is, if you really have high-performance needs for large datasets, it still might be better to use a hand-optimized solution.
However, we can still do a little better within just Astropy. If you look at the source code for SkyCoord.separation
we see that it's little more than a higher-level interface to a function called angular_separation
which computes the separation using a somewhat more computationally expensive Vincenty formula, using the lat/lon components of the coord's spherical representation.
For a computation like this you can eliminate a lot of overhead (like Astropy's automatical coordinate conversion) while using this function directly like:
def my_sum6(c1, c2):
angle_sum = 0
lon1 = c1.spherical.lon.to(u.rad).value
lat1 = c1.spherical.lat.to(u.rad).value
lon2 = c2.spherical.lon.to(u.rad).value
lat2 = c2.spherical.lat.to(u.rad).value
for idx in range(len(c1)):
angle_sum += angular_separation(lon1[idx], lat1[idx], lon2[idx+1:], lat2[idx+1:]).sum()
return np.rad2deg(angle_sum)
This is basically doing what SkyCoord.separation
is doing, but it's pre-computing the lat/lon arrays for both coordinates and converting them to radians first, then calling angular_separation
on them. It also skips the overhead of assering that both coordinates are in the same frame (they are both ICRS in this case so we are assuming they are). This performs almost as well as my_sum4
:
>>> %timeit my_sum6(c1, c2)
2.26 ms ± 123 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In fact, in this case the main thing making it slower than my_sum4
is just the increased complexity of the Vincenty formula used, and the fact that it's more generalized (not assuming that RA == DEC for each coordinate).
Upvotes: 3
Reputation: 2399
You should know some times using ready products is faster since all the tools are available. However in some condition, as yours, using ready product makes you slower in execution time.
In Your code you're creating
Then you just calculate the distance between them using separation
. These objects are more powerful then what you're using for and that's why they're slower.
Now we know one can calculate angular separation using:
arccos(sin(delta1) * sin(delta2) + cos(delta1) * cos(delta2) * sin(alpha1 - alpha2))
See: https://en.wikipedia.org/wiki/Angular_distance
Now you can implement it. Just don't forget your angles are in degrees
and trigonometric functions accepts angles in radians
def my_sum(df_frame):
length = len(df_frame) # Size of "for" loops
Sum = 0
df_frame_rad = np.deg2rad(df_frame)
for i in range(length - 1):
for j in range(i + 1, length):
# print(a2, d2)
dist = np.rad2deg(
np.arccos(
np.sin(df_frame_rad['A'].iloc[i]) * np.sin(df_frame_rad['B'].iloc[j]) + \
np.cos(df_frame_rad['A'].iloc[i]) * np.cos(df_frame_rad['B'].iloc[j]) * \
np.cos(df_frame_rad['A'].iloc[i] - df_frame_rad['B'].iloc[j])
)
)
Sum += dist
return Sum
For same data set, the results are:
Astropy Function: 533.3069727968582
Pure math function: 533.3069727982754
Not bad.
Astropy Function took, 2.932075262069702 sec
to finish
Pure math function took: 0.07899618148803711 sec
to finish
Upvotes: 3