Reputation: 97
I have created an ML pipeline using sklearn_pandas and sklearn. It looks like this.
features = ['ColA','ColB','ColC']
labels = 'ColD'
mapper = sklearn_pandas.DataFrameMapper([
('ColB',sklearn.preprocessing.StandardScaler()),
('ColC',sklearn.preprocessing.StandardScaler())
])
pipe = sklearn.pipeline.Pipeline([
('featurize', mapper),
('imputer', imputer),
('logreg', sklearn.linear_model.LogisticRegression())
])
cross_val_score = sklearn_pandas.cross_val_score(pipe,traindf[features],
traindf[labels],
'log_loss')
I like the model and the 'log_loss'
values that I am getting.
How do I use this pipeline to predict my test set?
When I do pipe.predict(testX[features])
I get an error that says:
'StandardScaler' object has no attribute 'mean_'
I have checked my test set. It looks fine.
Upvotes: 2
Views: 512
Reputation: 7195
You have to fit the pipeline first, like you fit any model/transformer:
pipe.fit(traindf[features], traindf[labels])
Upvotes: 2