Reputation: 21

Is there a fast Python algorithm to find all points in a dataset which lie in a given circle?

I have a large amount of data (over 10^5 points). I am searching a fast algorithm which finds all points in the dataset, which lie in a circle given by its center point and radius.

I thought about using an kd-tree to calculate for example the 10 nearest points to the circle's center, and then check if they are inside the circle. But I am not sure if this is the correct way.

Upvotes: 1

Answers (5)

kaya3

Reputation: 51034

A space-partitioning data structure like a k-d tree or quadtree could answer your query exactly; no need for a heuristic like "10 nearest points".

You can recursively search the tree:

If the current node's bounding rectangle does not intersect the circle at all, ignore it.
Otherwise, if the current node's bounding rectangle is completely contained within the circle, return all points within it.
Otherwise, if the current node's bounding rectangle partially overlaps with the circle:
- If the current node is a leaf, test each point individually to see if it's within the circle, and return the ones which are.
- Otherwise, recurse on the current node's children, and return all of the points returned by the recursive calls.

For step 2, the rectangle is fully contained in the circle if and only if its four corners are. For steps 1 and 3, you can either test whether the rectangle intersects the circle's bounding box, and conservatively recurse because the rectangle might intersect the circle; or you can do an exact (but slightly more complicated) check between the rectangle and the actual circle, saving a few unnecessary recursive calls.

Since this is Python, instead of returning the points in a collection and then concatenating/unioning the results of recursive calls, you could yield the points from a recursive generator function. That simplifies the implementation somewhat.

Upvotes: 2

max9111

Reputation: 6482

If you are only interested in the number of points which are in the circle you can try Numba.

import numpy as np
import numba as nb
import numexpr as ne

def method2(X,Y,cx,cy,r):
    """Numexpr method"""
    res = ne.evaluate('((X-cx)**2 + (Y-cy)**2) < r**2')
    return res

@nb.njit(fastmath=True,parallel=True)
def method3(X,Y,cx,cy,r):
    acc=0
    for i in nb.prange(X.shape[0]):
        if ((X[i]-cx)**2 + (Y[i]-cy)**2) < r**2:
            acc+=1
    return acc

Timings

# Ensure repeatable, deterministic randomness!
np.random.seed(42)

# Generate test arrays
N = 1000000
X = np.random.rand(N)
Y = np.random.rand(N)

# Define centre and radius
cx = cy = r = 0.5

#@Mark Setchell
%timeit method2(X,Y,cx,cy,r)
#825 µs ± 22.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit method3(X,Y,cx,cy,r)
#480 µs ± 94.4 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

Upvotes: 1

Mark Setchell

Reputation: 207425

I benchmarked a numexpr version against a simple Numpy implementation as follows:

#!/usr/bin/env python3

import numpy as np
import numexpr as ne

# Ensure repeatable, deterministic randomness!
np.random.seed(42)

# Generate test arrays
N = 1000000
X = np.random.rand(N)
Y = np.random.rand(N)

# Define centre and radius
cx = cy = r = 0.5

def method1(X,Y,cx,cy,r):
    """Straight Numpy determination of points in circle"""
    d = (X-cx)**2 + (Y-cy)**2
    res = d < r**2
    return res

def method2(X,Y,cx,cy,r):
    """Numexpr determination of points in circle"""
    res = ne.evaluate('((X-cx)**2 + (Y-cy)**2)<r**2')
    return res

def method3(data,a,b,r): 
    """List based determination of points in circle, with pre-filtering using a square"""
    in_square_points = [(x,y) for (x,y) in data if a-r < x < a+r and b-r < y < b+r] 
    in_circle_points = [(x,y) for (x,y) in in_square_points if (x-a)**2 + (y-b)**2 < r**2] 
    return in_circle_points

# Timing
%timeit method1(X,Y,cx,cy,r)

%timeit method2(X,Y,cx,cy,r)

# Massage input data (before timing) to match agorithm
data=[(x,y) for x,y in zip(X,Y)]
%timeit method3(data,cx,cy,r)

I then timed it in IPython as follows:

%timeit method1(X,Y,cx,cy,r)                                                                                                                     
6.68 ms ± 246 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit method2(X,Y,cx,cy,r)                                                                                                                     
743 µs ± 17.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit method3(data,cx,cy,r)
1.11 s ± 9.81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

So the numexpr version came out 9x faster. As the points lie in the range [0..1], the algorithm is effectively calculating pi and the two methods come out the same:

method1(X,Y,cx,cy,r).sum()                                                                                                                       
784973

method2(X,Y,cx,cy,r).sum()                                                                                                                       
784973

len(method3(data,cx,cy,r))                                                                                                                       
784973

4 * 784973 / N
3.139

Note: I should point out that numexpr multi-threads your code across multiple CPU cores for you, automatically. If you feel like experimenting, with the number of threads, you can change it dynamically before calling method2(), or even inside there, with:

# Split calculations across 6 threads
ne.set_num_threads(6)

Anyone else wishing to test the speed of their method is welcome to use my code as a benchmarking framework.

Upvotes: 4

Phoenixo

Reputation: 2113

If you want first to filter a large amount of your dataset without huge computations, you can use the Square of size (2r x 2r) with the same center as the circle (where r is the circle's radius).

Have a look at this picture :

If you have the center's coordinates (a,b) and r the radius, then the points (x,y) inside the square verify :

in_square_points = [(x,y) for (x,y) in data if a-r < x < a+r and b-r < y < b+r]

And finally after this filter you can apply the circle equation :

in_circle_points = [(x,y) for (x,y) in in_square_points if (x-a)**2 + (y-b)**2 < r**2]

** EDIT **

if your input is structured like this :

data = [
    [13, 45],
    [-1, 2],
    ...
    [60, -4]
]

Then you can try, if you prefer common for-loops :

in_square_points = []
for i in range(len(data)):
    x = data[i][0]
    y = data[i][1]
    if a-r < x < a+r and b-r < y < b+r:
        in_square_points.append([x, y])
print(in_square_points)

Upvotes: 1

Seraph Wedd

Reputation: 864

To check whether a point (a, b) is within a circle of center (x, y) and radius r, then you can simply do a computation:

within_circle = ((x-a)**2 + (y-b)**2) <= r*r)

This equation uses the property of the circle on which can get the absolute distance to a point (which is also used in the distance formula if you noticed).

Upvotes: 1

Is there a fast Python algorithm to find all points in a dataset which lie in a given circle?

Answers (5)

Related Questions