Reputation: 353
I'd like to find a way to determine which neighbors are actually used in my knn algorithm, so I can dive deeper into the rows of data that are similar to my features.
Here is an example of a dataset which I split into a training set and a test set for the prediction model:
Player PER VORP WS
Fabricio Oberto 11.9 1.0 4.1
Eddie Johnson 16.5 1.7 4.8
Tim Legler 15.9 2.0 6.8
Ersan Ilyasova 14.3 0.7 3.8
Kevin Love 25.4 3.5 10.0
Tim Hardaway 20.6 5.1 11.7
Frank Brickowsk 8.6 -0.2 1.6
etc....
And here is an example of my knn algorithm code:
features = ['PER','VORP']
knn = KNeighborsRegressor(n_neighbors=5, algorithm='brute')
knn.fit(train[features], train['WS'])
predictions = knn.predict(test[features])
Now, I'm aware that the algorithm will iterate over each row and make each target prediction based on the 5 closest neighbors that come from the target features I've specified.
I'd like to find out WHICH 5 n_neighbors were actually used in determining my target feature? In this case - which players were actually used in determining the target?
Is there a way to get a list of the 5 neighbors (aka players) which were used in the analysis for each row?
Upvotes: 1
Views: 869
Reputation: 36684
knn.kneighbors
will return you an array of the corresponding nearest neighbours.
Upvotes: 1