How do I use the ML sklearn pipeline to predict?

Question

I have created an ML pipeline using sklearn_pandas and sklearn. It looks like this.

features = ['ColA','ColB','ColC']
labels = 'ColD'

mapper = sklearn_pandas.DataFrameMapper([
    ('ColB',sklearn.preprocessing.StandardScaler()),
    ('ColC',sklearn.preprocessing.StandardScaler())
])
pipe = sklearn.pipeline.Pipeline([
        ('featurize', mapper),
        ('imputer', imputer),
        ('logreg', sklearn.linear_model.LogisticRegression())
])
cross_val_score = sklearn_pandas.cross_val_score(pipe,traindf[features],
                                                 traindf[labels],
                                                 'log_loss')

I like the model and the 'log_loss' values that I am getting. How do I use this pipeline to predict my test set?

When I do pipe.predict(testX[features]) I get an error that says:

'StandardScaler' object has no attribute 'mean_'

I have checked my test set. It looks fine.

dukebody · Accepted Answer

You have to fit the pipeline first, like you fit any model/transformer:

pipe.fit(traindf[features], traindf[labels])

How do I use the ML sklearn pipeline to predict?

Answers (1)

Related Questions