Reputation: 127
I've come across a small problem while using this module. In fact, the module does exactly what I'm asking him to do... which is finding all the nearest grid points for given coordinates among this grid.
But, when the given coordinates are very close to a point of the grid and the grid has longer steps on one side, it gives something like :
So in this image, the point to calculate nearest neighbor is the red dot you can see in the bottom left corner. The results given by KDTree are the blue squares. The green diamond is the 4th point I would like to get instead of the lone blue one at the top of the image.
Code :
>>> grid.head()
x y
0 0.000000 -9.490125
1 0.959131 -9.490125
2 1.918263 -9.490125
3 2.877394 -9.490125
4 3.836526 -9.490125
>>> pt
[4.0092010999999998e-05, -9.4901299629261011]
>>>tree = ssp.KDTree(grid)
>>>dis, idx = tree.query(pt,4)
>>> idx
array([ 0, 71, 1, 142])
>>> grid.iloc[idx]
x y
0 0.000000 -9.490125
71 0.000000 -8.980481
1 0.959131 -9.490125
142 0.000000 -8.470837
Is there a way to specify that we want a rectangle shaped array in the query or something? Maybe by specifying that we only want 2 y's for one x?
Upvotes: 0
Views: 1179
Reputation: 6655
First, let us try to create a Minimal, Complete, and Verifiable example
>>> import pandas as pd
>>> import numpy as np
>>> x0, dx = 0, 0.959131
>>> x = np.arange(x0, x0+5*dx,dx)
>>> y0, dy = -9.4901299629261011, 8.980481-8.470837
>>> y = np.arange(y0, y0+2*dy,dy)
>>> data = np.transpose([np.tile(x, len(y)), np.repeat(y, len(x))])
>>> grid = pd.DataFrame(data=data, columns=['x', 'y'])
>>> grid.head()
x y
0 0.000000 -9.49013
1 0.959131 -9.49013
2 1.918262 -9.49013
3 2.877393 -9.49013
4 3.836524 -9.49013
where grid.head()
is based on the numerical equivalent of grid
's graphic representation
>>> grid
x y
0 0.000000 -9.490130 # the red dot
1 0.959131 -9.490130 # the bottom right blue square
2 1.918262 -9.490130
3 2.877393 -9.490130
4 3.836524 -9.490130
5 0.000000 -8.980486 # the middle left blue square
6 0.959131 -8.980486 # the green diamond
7 1.918262 -8.980486
8 2.877393 -8.980486
9 3.836524 -8.980486
10 0.000000 -8.470842 # the unwanted top left blue square
11 0.959131 -8.470842
12 1.918262 -8.470842
13 2.877393 -8.470842
14 3.836524 -8.470842
Thus, you want the points 1
, 5
and 6
as neighborhood of point 0
.
To do so, you may want to have a look at the function kneighbors_graph
of the sklearn.neighbors module which implements the k-nearest neighbors algorithm. Playing with it, and setting the power parameter for the Minkowski metric, p
, greater than 2
, say 3
(the idea of taking p>2
is basically to reduce the euclidean squareroot-of-2 factor -- between diagonals and sides in a unit square -- toward 1), as follows
>>> from sklearn.neighbors import kneighbors_graph
>>> _3n_graph = kneighbors_graph(grid,
n_neighbors=3,
p=3,
mode='connectivity',
include_self=False)
yields
>>> grid.iloc[_3n_graph[0].indices]
x y
5 0.000000 -8.980486
1 0.959131 -9.490130
6 0.959131 -8.980486
Upvotes: 2