Scikit learn GridSearchCV with pipeline with custom transformer

Question

I'm trying to perform a GridSearchCV on a pipeline with a custom transformer. The transformer enriches the features "year" and "odometer" polynomially and one hot encodes the rest of the features. The ML model is a simple linear regression model.

custom transformer code:

import numpy as np
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.preprocessing import OneHotEncoder 
from sklearn.preprocessing import PolynomialFeatures

class custom_poly_features(TransformerMixin, BaseEstimator):
    def __init__(self, degree = 2, poly_features = ['year', 'odometer']):
        self.degree_ = degree
        self.poly_features_ = poly_features       
    def fit(self, X, y=None):
        # Return the classifier
        return self
    def transform(self, X, y=None):
        poly_feat = PolynomialFeatures(degree=self.degree_)
        OneHot = OneHotEncoder(sparse=False)

        not_poly_features = list(set(X.columns) - set(self.poly_features_))
        poly = poly_feat.fit_transform(X[self.poly_features_].to_numpy())
        poly = np.hstack([poly, OneHot.fit_transform(X[not_poly_features].to_numpy())])

        return poly
    def get_params(self, deep=True):
        return {"degree": self.degree_, "poly_features": self.poly_features_}

pipeline & gridsearch code:

#create pipeline
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression

poly_pipeline =  Pipeline(steps=[("cpf", custom_poly_features()), ("lin_reg", LinearRegression(n_jobs=-1))])

#perform gridsearch
from sklearn.model_selection import GridSearchCV
param_grid = {"cpf__degree": [3, 4, 5]}

search = GridSearchCV(poly_pipeline, param_grid, n_jobs=-1, cv=3)
search.fit(X_train_ordinal, y_train)

The custom transformer itself works fine and the pipeline also works (although the score is not great, but that is not the topic here).

poly_pipeline.fit(X_train, y_train).score(X_test, y_test)

Output:
0.543546844381771

However, when I perform the gridsearch, the scores are all nan values:

search.cv_results_

Output:
{'mean_fit_time': array([4.46928191, 4.58259885, 4.55605125]),
 'std_fit_time': array([0.18111937, 0.03305779, 0.02080789]),
 'mean_score_time': array([0.21119197, 0.13816587, 0.11357466]),
 'std_score_time': array([0.09206233, 0.02171508, 0.02127906]),
 'param_custom_poly_features__degree': masked_array(data=[3, 4, 5],
          mask=[False, False, False],
    fill_value='?',
         dtype=object),
 'params': [{'custom_poly_features__degree': 3},
  {'custom_poly_features__degree': 4},
  {'custom_poly_features__degree': 5}],
 'split0_test_score': array([nan, nan, nan]),
 'split1_test_score': array([nan, nan, nan]),
 'split2_test_score': array([nan, nan, nan]),
 'mean_test_score': array([nan, nan, nan]),
 'std_test_score': array([nan, nan, nan]),
 'rank_test_score': array([1, 2, 3])}

Does anyone know what the problem is? The transformer and the pipeline work fine on their own after all.

Scikit learn GridSearchCV with pipeline with custom transformer

Answers (1)

Related Questions