Reputation: 376
I have a list of coordinates a
. I want to specify a radius r
and then list all points on a 2D grid within the specified radius of any point in a
, together with the minimum distance of each of those grid-points to any point in a
. Since a
is substantial (~1000-2000 points) I want to be as efficient as possible.
My implementation so far uses this code to list all coordinates within the given radius of a point; then iterates that over all coordinates in a
; then flattens the output, takes a set (there will be MANY duplicates) - this is the set of coordinates within the specified radius of any point on the line - then calculates the minimum distances of that set to a
using scipy.spatial.distance.cdist
:
import numpy as np
np.random.seed(123)
a = np.random.randint(50, size=(100, 2))
def collect(coord, sigma: float =3.0):
""" create a small collection of points in a neighborhood of some point
"""
neighborhood = []
x=coord[0]
y=coord[1]
X = int(sigma)
for i in range(-X, X + 1):
Y = int(pow(sigma * sigma - i * i, 1/2))
for j in range(-Y, Y + 1):
neighborhood.append((x + i, y + j))
return neighborhood
rad = 10
coord_list = [collect(coord, rad) for coord in a]
coord_set = np.array(list(set([val for sublist in coord_list for val in sublist])))
from scipy.spatial import distance
dists = distance.cdist(coord_set, a).min(axis=1)
Result:
coord_set
> array([[26, 21],
[18, 17],
[50, 6],
...,
[14, 47],
[15, 12],
[ 7, 8]])
dists
> array([2.23606798, 5.65685425, 1.41421356, ..., 3.16227766, 3. ,
1.41421356])
Does anyone have any ways I can improve this and do it more efficiently?
Upvotes: 1
Views: 1430
Reputation: 26201
You can adapt the answer I gave to your other linked question in a very straightforward way. The outcome is also very fast (~425 ms for 10K points in a
).
Edit: For sparse cases (where the number of points actually filtered is a small portion of the complete grid), I also add a sparse_grid_within_radius()
version below.
(Important: use scipy >= 1.6.0
, where the Python implementation of KDTree
has been replaced by cKDTree
. See the release notes).
# please use scipy >= 1.6.0
from scipy.spatial import KDTree
def grid_within_radius(a, radius=10, resolution=(1,1), origin=(0,0)):
pmin = (a - origin).min(axis=0) - radius
pmax = (a - origin).max(axis=0) + radius
imin = np.floor(pmin / resolution)
imax = np.ceil(pmax / resolution) + 1
xv, yv = np.meshgrid(np.arange(imin[0], imax[0]), np.arange(imin[1], imax[1]))
grid = np.stack((xv, yv), axis=-1) * resolution + origin
dist, _ = KDTree(a).query(grid, k=1, distance_upper_bound=radius + 0.01)
idx = dist <= radius
return grid[idx], dist[idx]
First, the OP's example:
np.random.seed(123)
a = np.random.randint(10, size=(100, 2))
g, d = grid_within_radius(a)
To compare with the OP's result, we need to sort their solution (coord_set, dists
):
def sort2d(a, other=None):
other = a if other is None else other
return other[np.lexsort((a[:, 0], a[:, 1]))]
With this, we can check that our solution is the same:
>>> np.allclose(g, sort2d(coord_set))
True
>>> np.allclose(d, sort2d(coord_set, dists))
True
And another example (with a different grid resolution and radius):
g, d = grid_within_radius(a, radius=0.6, resolution=(.11, .47))
plt.scatter(*a.T, s=10, c='r')
plt.scatter(*g.T, s=1)
a = np.random.randint(1000, size=(10_000, 2))
%timeit grid_within_radius(a)
# 425 ms ± 528 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
The above works well when the number of points returned is a significant portion of the total grid (e.g. 30% or more). But for very sparse cases (e.g. where a combination of the bounding box of a
, the radius
and the grid resolution
result in generating a huge grid and then eliminate most of it), then it is slow. To illustrate, the following is a sparse case for a = np.random.randint(0, 200, (10, 2))
:
The version below fixes that by generating "patches" of grid around quantized locations instead of the full grid.
import numpy as np
import pandas as pd
from scipy.spatial import KDTree
def unique_rows(a):
# np.unique(a, axis=0) is slow, in part because it sorts;
# using pandas.drop_duplicates() instead.
# see https://github.com/numpy/numpy/issues/11136#issuecomment-822000680
return pd.DataFrame(a).drop_duplicates().values
def sparse_grid_within_radius(a, radius=10, resolution=(1,1), origin=(0,0), rbox=1):
resolution = np.array(resolution)
box_size = radius * rbox
nxy0 = np.floor(radius / resolution).astype(int)
nxy1 = np.ceil((box_size + radius) / resolution).astype(int) + 1
patch = np.stack(list(map(np.ravel, np.indices(nxy0 + nxy1))), axis=1) - nxy0
ar = np.floor((a - origin) / box_size) * box_size
ia = unique_rows(np.floor(ar / resolution).astype(int))
grid = unique_rows((ia[:, None] + patch).reshape(-1, 2)) * resolution + origin
dist, _ = KDTree(a).query(grid, k=1, distance_upper_bound=radius * 1.01)
idx = dist <= radius
return grid[idx], dist[idx]
With this, even very sparse results are fast.
a = np.random.randint(0, 4000, (100, 2))
%timeit g, d = grid_within_radius(a)
# 3.88 s ± 10.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit gs, ds = sparse_grid_within_radius(a)
# 29.6 ms ± 24.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
import perfplot
perfplot.show(
setup=lambda n: np.random.randint(0, n, (100, 2)),
kernels=[grid_within_radius, sparse_grid_within_radius],
n_range=[2 ** k for k in range(14)],
equality_check=lambda a, b: np.allclose(sort2d(a[0]), sort2d(b[0])),
)
Upvotes: 3
Reputation: 114230
Let's inspect your implementation carefully. Notice that you are already poised and partially compute a distance metric in collect
.
neighborhood
a dict
instead of a list
, keyed by grid point with a value of minimum distance. You could entirely eliminate the call to set
and cdist
.a
can contain float values, you should your range from int(coord[0] - rad)
to int(coord[0] + rad) + 1
. int(0.5 - 10)
is -9
, while int(0.5) - 10
is -10
.Points 2 and 3 are relatively minor improvements.
Here is an example:
rad = 10
rad2 = rad**2
points = {}
for coord in a:
for x in range(int(np.ceil(coord[0] - rad)), int(coord[0] + rad) + 1):
dx2 = (x - coord[0])**2
chord = np.sqrt(rad2 - dx2)
for y in range(int(np.ceil(coord[1] - chord)), int(coord[1] + chord) + 1):
d2 = dx2 + (y - coord[1])**2
key = x, y
points[key] = min(d2, points.get(key, rad2))
To convert this into numpy arrays:
grids = np.array(list(points.keys()))
nearest = np.sqrt(np.fromiter(points.values(), count=len(points), dtype=float))
Upvotes: 0