Lucy
Lucy

Reputation: 21

How to calculate an average value based on K-nearest neighbors?

I would like to write a function to calculate an average 'z' value based on K nearest neighbors (in this case K=2). I have the indices but can someone help me write a function for calculating the average z value for all the neighbors?

This is what I have so far:

from sklearn.neighbors import NearestNeighbors

X = array([[6,-3, 0.1], [-5,-9, 0.5], [3,-7, 0.8], [-10,6, 0.5], [-4,-16, 0.9], [1,-0.5, 0]])
# X is an array containing x,y,z values
# nbrs reads in the x,y values only

nbrs = NearestNeighbors(n_neighbors=2).fit(X)
distances, indices = nbrs.kneighbors(X)

print(indices)
# psuedocode below
[[0, index for neighbor1, index for neighbor2]
 [1, index for neighbor1, index for neighbor2]
 [2, index for neighbor1, index for neighbor2]
 [3, index for neighbor1, index for neighbor2]
......
# etc. for all 6 points in X
] 

Now that I have the indices I'd like to calculate the average z value for all the neighbors? I recognize there is only 2 here so it is easy to average but if we changed it to 50 neighbors can someone help me scale this up?

Upvotes: 1

Views: 197

Answers (2)

Nick ODell
Nick ODell

Reputation: 25454

If you want to predict some continuous value based on the nearest neighbor's values, then you can use KNeighborsRegressor to solve this.

Example:

import numpy as np
from sklearn.neighbors import KNeighborsRegressor
X = np.array([[6,-3, 0.1], [-5,-9, 0.5], [3,-7, 0.8], [-10,6, 0.5], [-4,-16, 0.9], [1,-0.5, 0]])
neigh = KNeighborsRegressor(n_neighbors=2, weights='uniform')
neigh.fit(X[:, :2], X[:, 2])
neigh.predict([[4, -7]])

Since you're asking for an average of all neighbors, I used weights='uniform'. An alternative to this is weights='distance', which gives closer neighbors more weight.

Docs

Upvotes: 1

proof-of-correctness
proof-of-correctness

Reputation: 324

To find the average z value of neighbors of each point in X, you can do:

all_z_pairs = [[X[index][2] for index in row] for row in indices]
mean_values = [sum(z_pair)/len(z_pair) for z_pair in all_z_pairs]

X[index] represents each neighbor and X[index][2] is the neighbor's z-value. Thus all_z_pairs is all z values for each neighbor of each point.

sum(z_pair)/len(z_pair) finds the mean. You can also do this to make it more readable:

from statistics import mean

...
mean_values = [mean(z_pair) for z_pair in all_z_pairs]

You can rewrite the all_z_pairs calculation as the following if it makes it easier to understand.

for row in indices:
   for index in row:
      all_z_pairs.append(X[index][2])

The indices list has a row for each point in X. A row is basically all neighbours of that point. So, the first list is looping over all sets of neighbours and the second list is looping over each and every neighbour.

Upvotes: 1

Related Questions