Reputation: 621
I am using a number of pipelines to compare in cross validation. As a benchmark model I want to include a simple model which uses always the same fixed coefficient, and hence, doesn't depend on the training data. In order to get the model I want I have decided to inherit all of the behaviour of sklearns linear model and implement my own .fit() method, which in fact doesn't look at the train data, but always uses a stored model.
When using my custom implementation as a model it works fine, however, as part of a pipeline I get a NotFittedError.
Creating my simple benchmark model and storing it:
import numpy as np
import pickle
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
X = np.array([[1],[2],[3]])
y = [10,20,30]
model = LinearRegression(fit_intercept=False).fit(X,y)
pickle.dump(model, open('benchmark_model.txt', 'wb'))
print (model.coef_)
[10.]
Defining my own benchmark_model() which implements custom fit method. The fit method opens the stored model
class benchmark_model(LinearRegression):
def fit(self, X, y = None):
self = pickle.load(open('benchmark_model.txt', 'rb'))
return self
Testing the custom fit implementation as model on different data seems to go well.
X=np.array([[1],[2],[3]])
y=[5,10,15]
model = benchmark_model()
model = model.fit(X,y)
print (model.coef_)
print (model.predict(X))
[10.] [10. 20. 30.]
Now, I am first using a normal LinearRegression as part of a pipeline, which seems to go as expected:
pipe = Pipeline([('model',LinearRegression())])
pipe.fit(X,y).predict(X)
array([ 5., 10., 15.])
However, when I use my custom benchmark model as part of the pipeline it doesn't work anymore.
pipe = Pipeline([('model',benchmark_model())])
pipe.fit(X,y).predict(X)
NotFittedError: This benchmark_model instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.
Upvotes: 1
Views: 962
Reputation: 1449
I assume the pipeline gets confused when benchmark_model.fit()
returns an instance of class LinearRegression
instead of benchmark_model
. It seems to work, if instead we just copy the learned parameters from the fixed model:
class benchmark_model(LinearRegression):
def fit(self, X, y = None):
fixed_model = pickle.load(open('benchmark_model.txt', 'rb'))
self.coef_ = fixed_model.coef_
self.intercept_ = fixed_model.intercept_
return self
Now fit
actually returns an instance of benchmark_model
.
Upvotes: 1