Reputation: 185
I have a labeled dataset having a 46D featureset and around 5000 samples that I want to classify using Approximate Nearest Neighbors.
Since I'm familiar with Scikit-Learn, I want to utilize it to achieve this goal.
The scikit documentations lists LSHForest as one of the probable methods for ANN, but it's unclear to me how to apply that for classification purposes.
Upvotes: 2
Views: 981
Reputation: 66835
Very nice question. Unfortunately scikit-learn does not seem to support custom neighbor model now, you can, however implement simple wrapper on your own, such as
from sklearn.neighbors import LSHForest
import numpy as np
from scipy.stats import mode
class LSH_KNN:
def __init__(self, **kwargs):
self.n_neighbors = kwargs['n_neighbors']
self.lsh = LSHForest(**kwargs)
def fit(self, X, y):
self.y = y
self.lsh.fit(X)
def predict(self, X):
_, indices = self.lsh.kneighbors(X, n_neighbors = self.n_neighbors)
votes, _ = mode(self.y[indices], axis=1)
return votes.flatten()
Upvotes: 4