user1769197
user1769197

Reputation: 2213

Different results from random forest after fixing the random state

I have the following code. I set the random state already. Everytime I do cross validation, it gives me a new set of optimal parameters. This just doesn't make sense to me. Why is this happening ?

rs = 5
param_range = np.arange(1,150,10,dtype=int)
param_range2 = np.arange(5,20,5,dtype=int)
pipe_steps = [('rfc',RandomForestClassifier())]
check_params = {
    'rfc__n_estimators':param_range,
    'rfc__max_depth':param_range2
}


pipeline = Pipeline(pipe_steps)

print('-------------------------- CV Start - Fitting training data --------------------------')
for K in [5,8,10]:
    create_grid = GridSearchCV(pipeline,param_grid=check_params,cv=KFold(n_splits=K, random_state=rs, shuffle=True))
    create_grid.fit(X_train,y_train)
    print('********************* Pipeline %d fold CV *********************' % (K))
    print(create_grid.best_params_)
    print("test score:= %3.2f" % (create_grid.score(X_test,y_test)))
print("CV End")

First time, I ran the code , it would give me below

-------------------------- CV Start - Fitting training data --------------------------
********************* Pipeline 5 fold CV *********************
{'rfc__max_depth': 10, 'rfc__n_estimators': 21}
test score:= 0.53
********************* Pipeline 8 fold CV *********************
{'rfc__max_depth': 10, 'rfc__n_estimators': 101}
test score:= 0.61
********************* Pipeline 10 fold CV *********************
{'rfc__max_depth': 5, 'rfc__n_estimators': 81}
test score:= 0.68
CV End

Second time, I ran the code, the optimal parameters change.

-------------------------- CV Start - Fitting training data --------------------------
********************* Pipeline 5 fold CV *********************
{'rfc__max_depth': 10, 'rfc__n_estimators': 81}
test score:= 0.55
********************* Pipeline 8 fold CV *********************
{'rfc__max_depth': 15, 'rfc__n_estimators': 71}
test score:= 0.53
********************* Pipeline 10 fold CV *********************
{'rfc__max_depth': 15, 'rfc__n_estimators': 81}
test score:= 0.63
CV End

Upvotes: 0

Views: 807

Answers (1)

desertnaut
desertnaut

Reputation: 60321

In order to get reproducible results, you have to set the seed for every action in the code that involves randomness. Here you do it for GridSearchCV and KFold, but not for your RandomForestClassifier; you should initialize it as

pipe_steps = [('rfc',RandomForestClassifier(random_state=rs))]

Upvotes: 1

Related Questions