Reputation: 1804
I am trying to create a custom estimator based on scikit learn. I have written the below dummy code to explain my problem. In the score method, I am trying to access mean_
calulated in fit. But I am unable to. What I am doing wrong? I have tried many things and have done this referring three four articles. But didn't find the issue.
I have read the documentation and did few changes. But nothing worked. I have also tried inheriting BaseEstimator
, ClassifierMixin
. But that also didn't work.
This a dummy program. Don't go by what it is trying to do.
import numpy as np
from sklearn.model_selection import cross_val_score
class FilterElems:
def __init__(self, thres):
self.thres = thres
def fit(self, X, y=None, **kwargs):
self.mean_ = np.mean(X)
self.std_ = np.std(X)
return self
def predict(self, X):
# return sign(self.predict(inputs))
X = (X - self.mean_) / self.std_
return X[X > self.thres]
def get_params(self, deep=False):
return {'thres': self.thres}
def score(self, *x):
print(self.mean_) # errors out, mean_ and std_ are wiped out
if len(x[1]) > 50:
return 1.0
else:
return 0.5
model = FilterElems(thres=0.5)
print(cross_val_score(model,
np.random.randint(1, 1000, (100, 100)),
None,
scoring=model.score,
cv=5))
Err:
AttributeError: 'FilterElems' object has no attribute 'mean_'
Upvotes: 5
Views: 472
Reputation: 16856
You are almost there.
The signature for scorer is scorer(estimator, X, y)
. The cross_val_score
calls the scorer
method by passing the estimator
object as the first parameter. Since your signature of scorer
is a variable argument function, the first item will hold the estimator
change your score to
def score(self, *x):
print(x[0].mean_)
if len(x[1]) > 50:
return 1.0
else:
return 0.5
Working code
import numpy as np
from sklearn.model_selection import cross_val_score
class FilterElems:
def __init__(self, thres):
self.thres = thres
def fit(self, X, y=None, **kwargs):
self.mean_ = np.mean(X)
self.std_ = np.std(X)
return self
def predict(self, X):
X = (X - self.mean_) / self.std_
return X[X > self.thres]
def get_params(self, deep=False):
return {'thres': self.thres}
def score(self, estimator, *x):
print(estimator.mean_, estimator.std_)
if len(x[0]) > 50:
return 1.0
else:
return 0.5
model = FilterElems(thres=0.5)
print(cross_val_score(model,
np.random.randint(1, 1000, (100, 100)),
None,
scoring=model.score,
cv=5))
Outout
504.750125 288.84916035447355
501.7295 289.47825925231416
503.743375 288.8964170227962
503.0325 287.8292687406025
500.041 289.3488678377712
[0.5 0.5 0.5 0.5 0.5]
Upvotes: 3
Reputation: 16966
The input for scoring
param in cross_val_score
needs to str
or callable
with signature scoring(estimator, X, y)
. In your case, you don't seems to need the y
, hence you can leave that in your callable. Also, you need to ensure that the output of the score has to be single value.
The solution would look something like this for your problem.
import numpy as np
from sklearn.model_selection import cross_val_score
from sklearn.base import TransformerMixin
class FilterElems(TransformerMixin):
def __init__(self, thres):
self.thres = thres
def fit(self, X, y=None, **kwargs):
self.mean_ = np.mean(X)
self.std_ = np.std(X)
return self
def predict(self, X):
# return sign(self.predict(inputs))
X = (X - self.mean_) / self.std_
return X[X > self.thres]
def get_params(self, deep=False):
return {'thres': self.thres}
def scorer(tranformer, X):
print(tranformer.mean_) # Now it prints out, mean_ and std_
result=[]
for x in X:
# do the stuff you want here
if x[1] > 50:
result.append(1)
else:
result.append(0.5)
# but return a single value
return np.mean(result)
np.random.seed(1)
model = FilterElems(thres=0.5)
print(cross_val_score(model,
np.random.randint(1, 1000, (100, 100)),
None,
scoring=scorer,
cv=5))
# [0.95 1. 1. 0.975 0.975]
Upvotes: 2