gcedo
gcedo

Reputation: 4931

Nearest Neighbors in Python given the distance matrix

I have to apply Nearest Neighbors in Python, and I am looking ad the scikit-learn and the scipy libraries, which both require the data as input, then will compute the distances and apply the algorithm.

In my case I had to compute a non-conventional distance, therefore I would like to know if there is a way to directly feed the distance matrix.

Upvotes: 8

Views: 8863

Answers (4)

Jarno
Jarno

Reputation: 7212

You can pass your own distance matrix to sklearn.neighbors.NearestNeighbors if you set metric="precomputed". As the following example shows, the results are indeed equivalent to passing the features directly, when using the euclidean distance metric.

import numpy as np
from numpy.testing import assert_array_equal
from scipy.spatial.distance import cdist
from sklearn.neighbors import NearestNeighbors

# Generate random vectors to use as data for k-nearest neighbors.
rng = np.random.default_rng(0)
X = rng.random((10, 2))

# Fit NearestNeighbors on vectors and retrieve neighbors.
knn_vector_based = NearestNeighbors(n_neighbors=2).fit(X)
nn_1 = knn_vector_based.kneighbors(return_distance=False)

# Calculate distance matrix.
# This computation can be replaced with any custom distance metric you have.
distance_matrix = cdist(X, X)

# Fit NearestNeighbors on distance matrix and retrieve neighbors.
knn_distance_based = (
    NearestNeighbors(n_neighbors=2, metric="precomputed")
        .fit(distance_matrix)
)

nn_2 = knn_distance_based.kneighbors(return_distance=False)

# Verify that that the result is the same.
assert_array_equal(nn_1, nn_2)

# Neighbors for single points can be retrieved by passing 
# a subset of the original distance matrix.
nn_of_first_point_1 = knn_vector_based.kneighbors(
    X[0, None], return_distance=False
)
nn_of_first_point_2 = knn_distance_based.kneighbors(
    distance_matrix[0, None], return_distance=False
)

assert_array_equal(nn_of_first_point_1, nn_of_first_point_2)

Upvotes: 2

Tomas Olsson
Tomas Olsson

Reputation: 365

Want to add to ford's answer that you have to do like this

metric = DistanceMetric.get_metric('pyfunc',func=/your function name/)

You cannot just put your own function as the second argument, you must name the argument as "func"

Upvotes: 0

ford
ford

Reputation: 11816

You'll want to create a DistanceMetric object, supplying your own function as an argument:

metric = sklearn.neighbors.DistanceMetric.get_metric('pyfunc', func=func)

From the docs:

Here func is a function which takes two one-dimensional numpy arrays, and returns a distance. Note that in order to be used within the BallTree, the distance must be a true metric: i.e. it must satisfy the following properties

  • Non-negativity: d(x, y) >= 0
  • Identity: d(x, y) = 0 if and only if x == y
  • Symmetry: d(x, y) = d(y, x)
  • Triangle Inequality: d(x, y) + d(y, z) >= d(x, z)

You can then create your classifier with metric=metric as a keyword argument and it will use this when calculating distances.

Upvotes: 6

tk.
tk.

Reputation: 626

As said by ford and according to the documentation http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html#sklearn.neighbors.KNeighborsClassifier you should convert your custom distance to a DistanceMetric object and pass it as the metric parameter.

Upvotes: 6

Related Questions