Reputation: 109
Here is the code.
from sklearn.neighbors import NearestNeighbors
import numpy as np
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
nbrs = NearestNeighbors(n_neighbors=2, algorithm='ball_tree').fit(X)
distances, indices = nbrs.kneighbors(X)
>indices
>array([[0, 1],[1, 0],[2, 1],[3, 4],[4, 3],[5, 4]])
>distances
>array([[0. , 1. ],[0. , 1. ],[0. , 1.41421356], [0. , 1. ],[0. , 1. ],[0. , 1.41421356]])
I don't really understand the shape of 'indices' and 'distances'. How do I understand what these numbers mean?
Upvotes: 7
Views: 6741
Reputation: 809
I will comment to the aforementioned, how you can get the "n_neighbors=2"
neighbors using the indices array, in a pandas dataframe. So,
import pandas as pd
df = pd.DataFrame([X.iloc[indices[row,col]] for row in range(indices.shape[0]) for col in range(indices.shape[1])])
Upvotes: 1
Reputation: 1637
Maybe a little sketch will help
As an example, the closest point to the training sample with index 0
is 1
, and since you are using n_neighbors = 2
(two neighbors) you would expect to see this pair in the results. And indeed you see that the pair [0, 1]
appears in the output.
Upvotes: 4
Reputation: 36599
Its pretty straightforward actually. For each data sample in the input to kneighbors()
(X
here), it will show 2 neighbors. (Because you have specified n_neighbors=2
. The indices
will give you the index of training data (again X
here) and distances
will give you the distance for the corresponding data point in training data (to which the indices are referring).
Take an example of single data point. Assuming X[0]
as the first query point, the answer will be indices[0]
and distances[0]
So for X[0]
,
the index of first nearest neighbor in training data is indices[0, 0] = 0
and distance is distances[0, 0] = 0
. You can use this index value to get the actual data sample from the training data.
This makes sense, because you used the same data for training and testing, so the first nearest neighbor for each point is itself and the distance is 0
.
the index of second nearest neigbor is indices[0, 1] = 1
and distance is distances[0, 1] = 1
Similarly for all other points. The first dimension in indices
and distances
correspond to the query points and second dimension to the number of neighbors asked.
Upvotes: 10