Fagui Curtain
Fagui Curtain

Reputation: 1917

Sklearn kNN usage with a user defined metric (again)

Someone posted a similar question here but I couldn't get my job done

see

Sklearn kNN usage with a user defined metric

I want to define my user_metric and use it in KNN.
I have a signature problem it seems but I don't understand it. thanks

gamma=2


def mydist2 (x,y):
    z=(x-y)
    return (z[0]^2+gamma*z[1]^2) 
neigh = KNeighborsClassifier(n_neighbors=3,metric=mydist2)

neigh.fit(traindata,train_labels)
neigh.score(testdata,test_labels)

def mydist2 (x,y):ValueError Traceback (most recent call last) <ipython-input-81-f934c7b5c9b3> in <module>()
→ 1 neigh.fit(traindata,train_labels)
   2 neigh.score(testdata,test_labels)

C:\Users\Fagui\Anaconda2\lib\site-packages\sklearn\neighbors\base.pyc
in fit(self, X, y)
801 self._y = self._y.ravel()
802
803 return self._fit(X)
804
805

C:\Users\Fagui\Anaconda2\lib\site-packages\sklearn\neighbors\base.pyc
in fit(self, X)
256 self.tree = BallTree(X, self.leaf_size,
257 metric=self.effective_metric
,
--> 258 **self.effective_metric_params
)
259 elif self._fit_method == 'kd_tree':
260 self._tree = KDTree(X, self.leaf_size,

    sklearn/neighbors/binary_tree.pxi in sklearn.neighbors.ball_tree.BinaryTree.init (sklearn\neighbors\ball_tree.c:8381)()

    sklearn/neighbors/dist_metrics.pyx in sklearn.neighbors.dist_metrics.DistanceMetric.get_metric
(sklearn\neighbors\dist_metrics.c:4032)()

    sklearn/neighbors/dist_metrics.pyx in sklearn.neighbors.dist_metrics.PyFuncDistance.init
(sklearn\neighbors\dist_metrics.c:10628)()

    ValueError: func must be a callable taking two arrays

as a bonus question, I'd like to pass gamma as an argument

thanks very much

Upvotes: 1

Views: 3721

Answers (3)

R&#233;da Boumahdi
R&#233;da Boumahdi

Reputation: 66

Define a metric in Cython, build the module to create the library and call it from your main code.

Sklearn is optimized and use cython and several process to run as fast as possible. Writing pure python code especially when it is called several times will slow your code. I recommend that you write your custom metric using cython. You have a tutorial that you can follow right here

Upvotes: -1

Fagui Curtain
Fagui Curtain

Reputation: 1917

my question was very stupid

the syntax was correct

the problem is that exponentiation in python is not with ^ but with **

hence 16=2**4 instead of 2^4

Upvotes: 2

arthur
arthur

Reputation: 2399

From KNeighborsClassifier documentation : the metric argument must be a string or DistanceMetric Object and you gave a function.

In order to pass your own metric you have to specify : metric='pyfunc' and add the keyword argument func=mydist2.

In the similar question : they explain that a custom metric can only be used when algorithm='ball_tree'is set and you kept the default which is 'auto'.

I think that the following should work:

neigh = KNeighborsClassifier(n_neighbors=3, algorithm='ball_tree',metric='pyfunc', func=mydist2)

When it comes to pass gamma as an argument I would try :

def mydist2 (x,y, gamma=2):
    z=(x-y)
    return (z[0]^2+gamma*z[1]^2) 

and add the argument metric_params={'gamma':2}

neigh = KNeighborsClassifier(n_neighbors=3, algorithm='ball_tree',metric='pyfunc', func=mydist2, metric_params={'gamma':2} )

But I'm not sure, there are no clear example in the doc.

Upvotes: 2

Related Questions