Amol Agrawal
Amol Agrawal

Reputation: 185

Classification using Approximate Nearest Neighbors in Scikit-Learn

I have a labeled dataset having a 46D featureset and around 5000 samples that I want to classify using Approximate Nearest Neighbors.

Since I'm familiar with Scikit-Learn, I want to utilize it to achieve this goal.

The scikit documentations lists LSHForest as one of the probable methods for ANN, but it's unclear to me how to apply that for classification purposes.

Upvotes: 2

Views: 981

Answers (1)

lejlot
lejlot

Reputation: 66835

Very nice question. Unfortunately scikit-learn does not seem to support custom neighbor model now, you can, however implement simple wrapper on your own, such as

from sklearn.neighbors import LSHForest
import numpy as np
from scipy.stats import mode

class LSH_KNN:

    def __init__(self, **kwargs):
        self.n_neighbors = kwargs['n_neighbors']
        self.lsh = LSHForest(**kwargs)

    def fit(self, X, y):
        self.y = y
        self.lsh.fit(X)

    def predict(self, X):
        _, indices = self.lsh.kneighbors(X, n_neighbors = self.n_neighbors)
        votes, _ = mode(self.y[indices], axis=1)
        return votes.flatten()

Upvotes: 4

Related Questions