Reputation: 97
I am new to using scipy. I can't quite seem to figure out the metric of distance_upper_bound
in scipy.spatial.KDTree.query
. Is it in Kilometers or Radians?
Upvotes: 0
Views: 1495
Reputation: 33522
It's exactly the value in terms of the chosen metric on which you decided using parameter p
.
If you did chose p=1
, aka manhattan-distance, the distance of the vectors: x=[1,2,3]
and y=[2,3,4]
is 3
. If you would have used distance_upper_bound=2
and y
is the next neighbor to x
you are looking for, don't expect a correct result.
Remark: this parameter you are talking about is set to inf
by default.
Your task seems to be about latitude/longitude points. In this case, i think you want to use the Haversine-metric (disclaimer: i'm no expert in this area).
Sadly, this metric is imho not available in terms of a p-norm, the only ones supported in scipy's neighbor-searches!
But: sklearn's BallTree can work with Haversine! (KDTree does not! Still p-norms!)
There is probably a good reason (either math or practical performance) why KDTree is not supporting Haversine, while BallTree does. Don't try to roll your own Haversine-KDTree blindly!
From here:
A binary search tree cannot handle the wraparound of the polar representation by design. You might need to transform the coordinates to a 3D cartesian space and then apply your favorite search algorithm, e.g., kD-Tree, Octree etc.
from sklearn.neighbors import KDTree, BallTree
KDTree.valid_metrics
# ['euclidean', 'l2', 'minkowski', 'p', 'manhattan', 'cityblock', 'l1', 'chebyshev',
# 'infinity']
BallTree.valid_metrics
# ['euclidean', 'l2', 'minkowski', 'p', 'manhattan', 'cityblock', 'l1', 'chebyshev',
# 'infinity', 'seuclidean', 'mahalanobis', 'wminkowski', 'hamming', 'canberra',
# 'braycurtis', 'matching', 'jaccard', 'dice', 'kulsinski', 'rogerstanimoto', 'russellrao',
# 'sokalmichener', 'sokalsneath', 'haversine', 'pyfunc']
Upvotes: 2