Reputation: 4931
I have to apply Nearest Neighbors in Python, and I am looking ad the scikit-learn
and the scipy
libraries, which both require the data as input, then will compute the distances and apply the algorithm.
In my case I had to compute a non-conventional distance, therefore I would like to know if there is a way to directly feed the distance matrix.
Upvotes: 8
Views: 8863
Reputation: 7212
You can pass your own distance matrix to sklearn.neighbors.NearestNeighbors
if you set metric="precomputed"
. As the following example shows, the results are indeed equivalent to passing the features directly, when using the euclidean distance metric.
import numpy as np
from numpy.testing import assert_array_equal
from scipy.spatial.distance import cdist
from sklearn.neighbors import NearestNeighbors
# Generate random vectors to use as data for k-nearest neighbors.
rng = np.random.default_rng(0)
X = rng.random((10, 2))
# Fit NearestNeighbors on vectors and retrieve neighbors.
knn_vector_based = NearestNeighbors(n_neighbors=2).fit(X)
nn_1 = knn_vector_based.kneighbors(return_distance=False)
# Calculate distance matrix.
# This computation can be replaced with any custom distance metric you have.
distance_matrix = cdist(X, X)
# Fit NearestNeighbors on distance matrix and retrieve neighbors.
knn_distance_based = (
NearestNeighbors(n_neighbors=2, metric="precomputed")
.fit(distance_matrix)
)
nn_2 = knn_distance_based.kneighbors(return_distance=False)
# Verify that that the result is the same.
assert_array_equal(nn_1, nn_2)
# Neighbors for single points can be retrieved by passing
# a subset of the original distance matrix.
nn_of_first_point_1 = knn_vector_based.kneighbors(
X[0, None], return_distance=False
)
nn_of_first_point_2 = knn_distance_based.kneighbors(
distance_matrix[0, None], return_distance=False
)
assert_array_equal(nn_of_first_point_1, nn_of_first_point_2)
Upvotes: 2
Reputation: 365
Want to add to ford's answer that you have to do like this
metric = DistanceMetric.get_metric('pyfunc',func=/your function name/)
You cannot just put your own function as the second argument, you must name the argument as "func"
Upvotes: 0
Reputation: 11816
You'll want to create a DistanceMetric
object, supplying your own function as an argument:
metric = sklearn.neighbors.DistanceMetric.get_metric('pyfunc', func=func)
From the docs:
Here
func
is a function which takes two one-dimensional numpy arrays, and returns a distance. Note that in order to be used within the BallTree, the distance must be a true metric: i.e. it must satisfy the following properties
- Non-negativity: d(x, y) >= 0
- Identity: d(x, y) = 0 if and only if x == y
- Symmetry: d(x, y) = d(y, x)
- Triangle Inequality: d(x, y) + d(y, z) >= d(x, z)
You can then create your classifier with metric=metric
as a keyword argument and it will use this when calculating distances.
Upvotes: 6
Reputation: 626
As said by ford and according to the documentation http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html#sklearn.neighbors.KNeighborsClassifier you should convert your custom distance to a DistanceMetric object and pass it as the metric parameter.
Upvotes: 6