
Reputation: 115

How do I fix: "FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan?"

from sklearn.model_selection import GridSearchCV, KFold

param_grid = {'select__k': np.arange(1, data_x_numeric.shape[1] + 1)}
cv = KFold(n_splits=3, random_state=1, shuffle=True)
gcv = GridSearchCV(pipe, param_grid, return_train_score=True, cv=cv), data_y)

results = pd.DataFrame(gcv.cv_results_).sort_values(by='mean_test_score', ascending=False)
results.loc[:, ~results.columns.str.endswith("_time")]

After running the above code I get a warning advising that estimator fit failed.

FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
line 598, in _fit_and_score, y_train, **fit_params)
"," line 341, in fit Xt = self._fit(X, y, **fit_params_steps) "," line 303, in _fit X, fitted_transformer = fit_transform_one_cached(
"," line 352, in __call__ return self.func(*args, **kwargs) "," line 754, in _fit_transform_one res = transformer.fit_transform(X, y, **fit_params)
"," line 702, in fit_transform return, y, **fit_params).transform(X), line 353, in fit score_func_ret = self.score_func(X, y)
"<ipython-input-413-f8e48283bbee>," line 7, in fit_and_score_features, y)
"" line 426, in fit delta = solve(optimizer.hessian, optimizer.gradient,
"," line 214, in solve _solve_check(n, info)
"," line 29, in _solve_check raise LinAlgError('Matrix is singular.')
numpy.linalg.LinAlgError: Matrix is singular.

  warnings.warn("Estimator fit failed. The score on this train-test"
"": FutureWarning: The `inplace` parameter in pandas.Categorical.set_categories is deprecated and will be removed in a future version. Removing unused categories will always return a new Categorical object.
  res = method(*args, **kwargs)
"": FutureWarning: The `inplace` parameter in pandas.Categorical.set_categories is deprecated and will be removed in a future version. Removing unused categories will always return a new Categorical object.
  res = method(*args, **kwargs)
"": FutureWarning: The `inplace` parameter in pandas.Categorical.set_categories is deprecated and will be removed in a future version. Removing unused categories will always return a new Categorical object.
  res = method(*args, **kwargs)
"": FutureWarning: The `inplace` parameter in pandas.Categorical.set_categories is deprecated and will be removed in a future version. Removing unused categories will always return a new Categorical object.
  res = method(*args, **kwargs)

I get this warning multiple times and the code continues to run for more than 30 minutes. I have removed the routing path for alot of the warning, so that is why it may look different. The above warning is produced multiple times for this block of code.

I am following the Scikit-Survival documentation and am stuck at this point. Some of the additional code provided may help with the error, but I am not sure what is effecting the error.

data_x is a Pandas dataframe with the following data types


f1   category
f2   category
f3   category
f4   float64
f5   category
f6   category
f7   category
f8   category
f9   category
f10  category
f11  category
f12  category
f13  int64
f14  category
f15  category
f16  category
f17  category
f18  category
f19  category
f20  category
f21  int64
dtype: object

data_y is a numpy array


array([( True, 481.), ( True, 424.), ( True, 519.), ..., ( True,  13.),
       ( True,  96.), ( True,   6.)],
      dtype=[('event', '?'), ('duration', '<f8')])

data_x_numeric is the new dataframe that is onehotencoded for prediction.

data_x_numeric = OneHotEncoder().fit_transform(data_x)

I also obtained individual c-index scores for each feature.

def fit_and_score_features(X, y):
    n_features = X.shape[1]
    scores = np.empty(n_features)
    m = CoxPHSurvivalAnalysis()
    for j in range(n_features):
        Xj = X[:, j:j+1], y)
        scores[j] = m.score(Xj, y)
    return scores

scores = fit_and_score_features(data_x_numeric.values, data_y)
pd.Series(scores, index=data_x_numeric.columns).sort_values(ascending=False)

f1   0.631355
f2   0.564762
f3   0.564288
f4   0.554376
f5   0.549956
f94  0.498701
f95  0.498413
f96  0.483840
f97  0.460941
f98  0.460898

I then created a pipeline.

#Creates pipline
from sklearn.feature_selection import SelectKBest
from sklearn.pipeline import Pipeline

pipe = Pipeline([('encode', OneHotEncoder()),
                 ('select', SelectKBest(fit_and_score_features, k=3)),
                 ('model', CoxPHSurvivalAnalysis())])

This is the point where I applied the code from the beginning of the post in order to select my best features to maximize the overall c-index score. I am not quite sure what is going on and would greatly appreciate any help.

Upvotes: 2

Views: 31235

Answers (4)

Nityam Vakil
Nityam Vakil

Reputation: 1

I resolved this issue by changing the 'penalty' term. I had 'elasticnet' as penalty for my Logsitc Regression model which was making some of the coefficients 0. The trick is to use a penalty that does not make the coefficients 0.

Upvotes: 0

drishti agrawal
drishti agrawal

Reputation: 1

Check for NaN values present in the dataset. I had the same error and it got resolved after replacing them.

Upvotes: 0

Pepeti Siddhardha
Pepeti Siddhardha

Reputation: 1

for me error arised because in parameters grid I took max_features:[1,3,5,7] but my data only has 6 features so for me it showed fit failed error. But after i removed 7 and left max_features:[1,3,5] then my code runned very perfectly.

So i would suggest everybody to check the hyper parameters they are passing before doing randomized search cv

Upvotes: 0

Darrell Corn
Darrell Corn

Reputation: 11

Check for missing data. I had the same error. The program ran fine once I deleted the rows with empty cells.

Upvotes: 1

Related Questions