unstuck
unstuck

Reputation: 646

FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan

I'm trying to optimize the parameters learning rate and max_depth of a XGB regression model:

from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import cross_val_score
from xgboost import XGBRegressor

param_grid = [
    # trying learning rates from 0.01 to 0.2
    {'eta ':[0.01, 0.05, 0.1, 0.2]},
    # and max depth from 4 to 10
    {'max_depth': [4, 6, 8, 10]}
  ]

xgb_model = XGBRegressor(random_state = 0)
grid_search = GridSearchCV(xgb_model, param_grid, cv=5,
                           scoring='neg_root_mean_squared_error',
                           return_train_score=True)

grid_search.fit(final_OH_X_train_scaled, y_train)

final_OH_X_train_scaled is the training dataset that contains only numerical features.

y_train is the training label - also numerical.

This is returning the error:

FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan.

I've seen other similar questions, but couldn't find an answer yet.

Also tried with:

param_grid = [
    # trying learning rates from 0.01 to 0.2
    # and max depth from 4 to 10
    {'eta ': [0.01, 0.05, 0.1, 0.2], 'max_depth': [4, 6, 8, 10]}   
  ]

But it generates the same error.

EDIT: Here's a sample of the data:

final_OH_X_train_scaled.head()

enter image description here

y_train.head()

enter image description here

EDIT2:

The data sample may be retrieved with:

final_OH_X_train_scaled = pd.DataFrame([[0.540617 ,1.204666 ,1.670791 ,-0.445424 ,-0.890944 ,-0.491098 ,0.094999 ,1.522411 ,-0.247443 ,-0.559572 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,1.0 ,0.0 ,0.0], 
                   [0.117467 ,-2.351903 ,0.718969 ,-0.119721 ,-0.874705 ,-0.530832 ,-1.385230 ,2.126612 ,-0.947731 ,-0.156967 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,1.0 ,0.0 ,0.0 ,0.0 ,0.0], 
                   [0.901138 ,-0.208256 ,-0.019134 ,0.265250 ,-0.889128 ,-0.467753 ,0.169306 ,-0.973256 ,0.056164 ,-0.671978 , 0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,1.0 ,0.0 ,0.0],
                   [2.074639 ,0.100602 ,-1.645121 ,0.929598 ,0.811911 ,1.364560 ,0.337242 ,0.435187 ,-0.388075 ,1.279959 , 0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,1.0], 
                   [2.198099 ,-0.496254 ,-0.917933 ,-1.418407 ,-0.975889 ,1.044495 ,0.254181 ,1.335285 ,2.079415 ,2.071974 , 0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,1.0 ,0.0 ,0.0 ,0.0 ,0.0]],
                  columns=['cont0' ,'cont1' ,'cont2' ,'cont3' ,'cont4' ,'cont5' ,'cont6' ,'cont7' ,'cont8' ,'cont9' ,'31' ,'32' ,'33' ,'34' ,'35' ,'36' ,'37' ,'38' ,'39' ,'40'])

Upvotes: 5

Views: 30762

Answers (3)

Woodly0
Woodly0

Reputation: 434

I ran into the same error and the cause wasn't an extra space in the naming. It took a long time to find what happened so I'm posting it here:

I was using a Pipeline from scikit-learn as estimator which included a OneHotEncoder that automatically creates "0/1" features depending and the categories present in the training set. This usually works if each of the category is reasonably represented. However, there was a feature with a very sparse category (less than 1% overall), so depending on the CV split the category was missing and therefore the onehot-encoded column was missing as well. This absence created the issue further down the pipline where the encoded feature was explicitly selected.

To avoid this issue when using OneHotEncoder, you should either explicitly specify the expected categories or use the min_frequency parameter.

Upvotes: 0

heschmat
heschmat

Reputation: 113

Also for example, if for a LogisticRegression you set the grid to sth like

grid_lr = {
'cls__class_weight': [None, 'balanced'],
'cls__C': [0, .001, .01, .1, 1]
}

You'll get a similar error; the reason being that C could only take positive float values. Hence, simply double checking the naming or the values of the hyperparameters should be enough to resolve this issue.

Upvotes: 0

TC Arlen
TC Arlen

Reputation: 1482

I was able to reproduce the problem and the code fails to fit because there is an extra space in your eta parameter! Instead of this:

{'eta ':[0.01, 0.05, 0.1, 0.2]},...

Change it to this:

{'eta':[0.01, 0.05, 0.1, 0.2]},...

The error message was unfortunately not very helpful.

Upvotes: 4

Related Questions