Reputation: 1074
This is an extension from my previous question How to convert a groupby().mean() into a callable object?
I am grateful for the help that I received from this forum and Alberto Garcia-Raboso in particular who answered my question about this model.
As I proceed, more errors occur. This one seems hard for me to correct. It is about the performance evaluation of the model. I attempted to use .score(pred_values, real_values) but the error suggests the input values are not in the [index]:
KeyError: 'None of [[87.333333333333329, 76.0, 81.5, 87.333333333333329, 87.333333333333329, 76.0, 81.5]] are in the [index]'
I am not sure how to explain this. Where is the index and how to access to it and fix the problem?
I have been pondering about this actually for a long while. As I try again, I still cannot solve the problem. I would be grateful to any assistance. Thank you.
Model
from sklearn.base import BaseEstimator, ClassifierMixin
import pandas as pd
import numpy as np
class MeanClassifier(BaseEstimator, ClassifierMixin):
def __init__(self):
pass
def fit(self, X, y):
self.name = X
self.scores = y
self.data = pd.DataFrame({"name": self.name, "score": self.scores})
#print(self.data)
self.means = self.data.groupby(["name"]).mean()
#print(self.means)
return self
def predict(self, X):
return list(self.means.loc[X, 'score'])
Data inputs and model testing
names = ["John", "Mary", "Suzie", "John", "John", "Mary", "Suzie"]
scores = [80, 70, 75, 90, 92, 82, 88]
dd = pd.DataFrame({"name": names, "score": scores})
ddnames = list(dd['name'])
ddscores = list(dd['score'])
B = MeanClassifier()
Bfit = B.fit(ddnames, ddscores)
Bpred = B.predict(dd['name'])
#print(Bpred)
print(B.score(Bpred, ddscores)) #The error appears here
Upvotes: 1
Views: 1815
Reputation: 5860
There are two problems in your code...the first one is with the score
method.
The function definition of score is like -
score(X, y[, sample_weight])
And just to mention score
calls predict
itself in the backend.
where X is your feature set and y is your true data. What you supplied is predicted list and the true list. So change that line to simply -
print(B.score(ddnames, ddscores))
But if you run this you'll get another error -
Can't handle mix of multiclass and continuous
And why you get this error is you are inheriting ClassifierMixin
and doing a regression task. So in simpler words you are giving continuous output but classifiermixin
is thinking of it as a classification problem.
So just inherit RegressorMixin
and you are good to go.
#left code#
from sklearn.base import BaseEstimator, RegressorMixin
class MeanClassifier(BaseEstimator, RegressorMixin):
def __init__(self):
pass
#left code#
print(B.score(ddnames, ddscores))
Output -
0.395607701564
Upvotes: 2