Speed up calculation in Astropy

Question

I am trying to calculate the sum of distances between a list of points using astropy. However my implementation is too slow to by implemented with my data, this is one example of my code:

import pandas as pd
import numpy as np  

# synthetic data
lst2 = list(range(50))
lst2 = np.array(lst2)/50
lst3 = np.array(lst2)/51

df = pd.DataFrame(list(zip(lst2, lst3)),
               columns =['A', 'B'])

# Sum of the distance between different points
def Sum(df_frame):
    length = len(df_frame) #Size of "for" loops
    Sum = 0 
    for i in range(length - 1): 
        for j in range(i+1,length):
            c1 = SkyCoord(df_frame['A'].iloc[i]*u.deg, df_frame['A'].iloc[i]*u.deg, frame='icrs')
            c2 = SkyCoord(df_frame['B'].iloc[j]*u.deg, df_frame['B'].iloc[j]*u.deg, frame='icrs') 
            angle = c1.separation(c2).deg
            Sum += angle
    return  Sum

Sum(df)

Does anyone know how to increase the computational speed of this code?

My real data has millions of points.

Iguananaut · Accepted Answer

This answer still going to be incredibly slow especially on a large dataframe, since you have a double-loop indexing the dataframe like df['A'].loc[i] for every O(n^2) pair of elements.

I tried this with a dataframe containing just 1000 elements in each column and it took ages. For larger numbers I just gave up waiting. It speeds up considerably if you instead pass the columns to the function as normal numpy arrays, and then before performing the distance calculation also assign A_i = A[i]; B_j = B[j], i.e. like:

Using pure NumPy

def my_sum2(A, B):
    length = len(A)  # Size of "for" loops
    assert length == len(B)
    Sum = 0
    A = np.deg2rad(np.asarray(A))
    B = np.deg2rad(np.asarray(B))
    for i in range(length - 1):
        for j in range(i + 1, length):
            # print(a2, d2)
            A_i = A[i]
            B_j = B[j]
            dist = np.rad2deg(
                np.arccos(
                    np.sin(A_i) * np.sin(B_j) + \
                    np.cos(A_i) * np.cos(B_j) * \
                    np.cos(A_i - B_j)
                )
            )
            Sum += dist
    return Sum

For 100 elements I got:

>>> %timeit my_sum(df)
229 ms ± 3.06 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit my_sum2(df['A'], df['B'])
41.1 ms ± 2.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

But you can do much better still by pre-computing the sines and cosines using vectorized operations. This has a consequence of greater memory usage, at a tradeoff of speed (we could also build a matrix cos_A_B = np.cos(A[:, np.newaxis] - B) for the cos(A[i] - B[j]) factor, but this would be prohibitively memory-exhaustive if A and B are large):

def my_sum3(A, B):
    length = len(A)  # Size of "for" loops
    assert length == len(B)
    Sum = 0
    A = np.deg2rad(np.asarray(A))
    B = np.deg2rad(np.asarray(B))
    cos_A = np.cos(A)
    sin_A = np.sin(A)
    cos_B = np.cos(B)
    sin_B = np.sin(B)

    for i in range(length - 1):
        for j in range(i + 1, length):
            # print(a2, d2)
            dist = np.rad2deg(
                np.arccos(
                    sin_A[i] * sin_B[j] + \
                    cos_A[i] * cos_B[j] * \
                    np.cos(A[i] - B[j])
                )
            )
            Sum += dist
    return Sum

>>> %timeit my_sum3(df['A'], df['B'])
20.2 ms ± 715 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

But for pairwise calculations with NumPy arrays we can take further advantage of NumPy's element-wise broadcasting in order to eliminate the inner for-loop completely:

def my_sum4(A, B):
    length = len(A)  # Size of "for" loops
    assert length == len(B)
    Sum = 0
    A = np.deg2rad(np.asarray(A))
    B = np.deg2rad(np.asarray(B))
    cos_A = np.cos(A)
    sin_A = np.sin(A)
    cos_B = np.cos(B)
    sin_B = np.sin(B)
    
    for i in range(length - 1):
        Sum += np.sum(np.rad2deg(np.arccos(
            sin_A[i] * sin_B[i + 1:] +
            cos_A[i] * cos_B[i + 1:] *
            np.cos(A[i] - B[i + 1:]))))

    return Sum

>>> %timeit my_sum4(df['A'], df['B'])
1.31 ms ± 71.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

There are many many other ways this could be micro-optimized, using Cython, scipy, etc. but I won't spend any more time on it here.

The other problem with this approach is it's specifically geared to the detail of the OP's question where each coordinate has identical RA and DEC for some reason, and is not generalized.

Using SkyCoord

Something Astropy beginners often miss about the SkyCoord class (and many other classes in Astropy) is that a single SkyCoord can be a container for an array of coordinates, not just a single coordinate.

In the OP's question they are creating millions of SkyCoord objects, one for each coordinate. In fact you could simply do this:

>>> c1 = SkyCoord(df['A']*u.deg, df['A']*u.deg, frame='icrs')
>>> c2 = SkyCoord(df['B']*u.deg, df['B']*u.deg, frame='icrs')

Methods like SkyCoord.separation also work element-wise just like other functions on NumPy arrays:

>>> c1.separation(c2)

So for every pair-wise separation you could use a similar technique as in my my_sum4 solution, freeing you from having to write the calculation yourself:

def my_sum5(c1, c2):
    angle_sum = 0
    for idx in range(len(c1)):
        angle_sum += c1[idx].separation(c2[idx + 1:]).sum()
    return angle_sum

>>> my_sum5(c1, c2)

This is admittedly considerably slower than the last pure-NumPy solution:

>>> %timeit my_sum5(c1, c2)
166 ms ± 10.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

This overhead is the cost of some of Astropy's high-level interfaces, and I agree with MSH where they wrote in their answer:

You should know some times using ready products is faster since all the tools are available. However in some condition, as yours, using ready product makes you slower in execution time.

That is, if you really have high-performance needs for large datasets, it still might be better to use a hand-optimized solution.

However, we can still do a little better within just Astropy. If you look at the source code for SkyCoord.separation we see that it's little more than a higher-level interface to a function called angular_separation which computes the separation using a somewhat more computationally expensive Vincenty formula, using the lat/lon components of the coord's spherical representation.

For a computation like this you can eliminate a lot of overhead (like Astropy's automatical coordinate conversion) while using this function directly like:

def my_sum6(c1, c2):
    angle_sum = 0
    lon1 = c1.spherical.lon.to(u.rad).value
    lat1 = c1.spherical.lat.to(u.rad).value
    lon2 = c2.spherical.lon.to(u.rad).value
    lat2 = c2.spherical.lat.to(u.rad).value
    
    for idx in range(len(c1)):
        angle_sum += angular_separation(lon1[idx], lat1[idx], lon2[idx+1:], lat2[idx+1:]).sum()
    return np.rad2deg(angle_sum)

This is basically doing what SkyCoord.separation is doing, but it's pre-computing the lat/lon arrays for both coordinates and converting them to radians first, then calling angular_separation on them. It also skips the overhead of assering that both coordinates are in the same frame (they are both ICRS in this case so we are assuming they are). This performs almost as well as my_sum4:

>>> %timeit my_sum6(c1, c2)
2.26 ms ± 123 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In fact, in this case the main thing making it slower than my_sum4 is just the increased complexity of the Vincenty formula used, and the fact that it's more generalized (not assuming that RA == DEC for each coordinate).

Speed up calculation in Astropy

Answers (2)

Using pure NumPy

Using SkyCoord

Related Questions