ggaurav
ggaurav

Reputation: 1804

Custom scikit-learn scorer can't access mean after fit

I am trying to create a custom estimator based on scikit learn. I have written the below dummy code to explain my problem. In the score method, I am trying to access mean_ calulated in fit. But I am unable to. What I am doing wrong? I have tried many things and have done this referring three four articles. But didn't find the issue.

I have read the documentation and did few changes. But nothing worked. I have also tried inheriting BaseEstimator, ClassifierMixin. But that also didn't work.

This a dummy program. Don't go by what it is trying to do.

import numpy as np
from sklearn.model_selection import cross_val_score


class FilterElems:
    def __init__(self, thres):
        self.thres = thres

    def fit(self, X, y=None, **kwargs):
        self.mean_ = np.mean(X)
        self.std_ = np.std(X)
        return self

    def predict(self, X):
        #         return sign(self.predict(inputs))
        X = (X - self.mean_) / self.std_
        return X[X > self.thres]

    def get_params(self, deep=False):
        return {'thres': self.thres}

    def score(self, *x):
        print(self.mean_)  # errors out, mean_ and std_ are wiped out
        if len(x[1]) > 50:
            return 1.0
        else:
            return 0.5


model = FilterElems(thres=0.5)
print(cross_val_score(model,
                      np.random.randint(1, 1000, (100, 100)),
                      None,
                      scoring=model.score,
                      cv=5))

Err:

AttributeError: 'FilterElems' object has no attribute 'mean_'

Upvotes: 5

Views: 472

Answers (2)

mujjiga
mujjiga

Reputation: 16856

You are almost there.

The signature for scorer is scorer(estimator, X, y). The cross_val_score calls the scorer method by passing the estimator object as the first parameter. Since your signature of scorer is a variable argument function, the first item will hold the estimator

change your score to

def score(self, *x):
    print(x[0].mean_)
    if len(x[1]) > 50:
        return 1.0
    else:
        return 0.5

Working code

import numpy as np
from sklearn.model_selection import cross_val_score

class FilterElems:
    def __init__(self, thres):
        self.thres = thres

    def fit(self, X, y=None, **kwargs):
        self.mean_ = np.mean(X)
        self.std_ = np.std(X)
        return self

    def predict(self, X):
        X = (X - self.mean_) / self.std_
        return X[X > self.thres]

    def get_params(self, deep=False):
        return {'thres': self.thres}

    def score(self, estimator, *x):
        print(estimator.mean_, estimator.std_) 
        if len(x[0]) > 50:
            return 1.0
        else:
            return 0.5

model = FilterElems(thres=0.5)
print(cross_val_score(model,
                      np.random.randint(1, 1000, (100, 100)),
                      None,
                      scoring=model.score,
                      cv=5))

Outout

504.750125 288.84916035447355
501.7295 289.47825925231416
503.743375 288.8964170227962
503.0325 287.8292687406025
500.041 289.3488678377712
[0.5 0.5 0.5 0.5 0.5]

Upvotes: 3

Venkatachalam
Venkatachalam

Reputation: 16966

The input for scoring param in cross_val_score needs to str or callable with signature scoring(estimator, X, y). In your case, you don't seems to need the y, hence you can leave that in your callable. Also, you need to ensure that the output of the score has to be single value.

The solution would look something like this for your problem.

import numpy as np
from sklearn.model_selection import cross_val_score
from sklearn.base import TransformerMixin

class FilterElems(TransformerMixin):
    def __init__(self, thres):
        self.thres = thres

    def fit(self, X, y=None, **kwargs):
        self.mean_ = np.mean(X)
        self.std_ = np.std(X)
        return self

    def predict(self, X):
        #         return sign(self.predict(inputs))
        X = (X - self.mean_) / self.std_
        return X[X > self.thres]

    def get_params(self, deep=False):
        return {'thres': self.thres}


def scorer(tranformer, X):
    print(tranformer.mean_)  # Now it prints out, mean_ and std_ 
    result=[]
    for x in X:
        # do the stuff you want here
        if x[1] > 50:
            result.append(1)
        else:
            result.append(0.5)
    # but return a single value
    return np.mean(result)

np.random.seed(1)
model = FilterElems(thres=0.5)
print(cross_val_score(model,
                      np.random.randint(1, 1000, (100, 100)),
                      None,
                      scoring=scorer,
                      cv=5))

# [0.95  1.    1.    0.975 0.975]

Upvotes: 2

Related Questions