shen ruxuan
shen ruxuan

Reputation: 1

RandomizedSearchCV best_params_ changes everytime when I run this program

I want to use LogisticRegression to classify. So, I use RandomizedSearchCV to pick best C params in LogisticRegression.

My question is: Why do best_params_ change every time I run this program? I assume that best_params_ should always stay the same.

Code as follows:

data = load_iris().data
target = load_iris().target

# DATA Split

TrainData , TestData ,TrainTarget , TestTarget = train_test_split(data,target,test_size=0.25,random_state=0)
assert len(TrainData)==len(TrainTarget)
Skf = StratifiedKFold(n_splits=5)

#Model

LR = LogisticRegression(C=10,multi_class='multinomial',penalty='l2',solver='sag',max_iter=10000,random_state=0)

#Params selection with Cross Validation
params = {'C':np.random.randint(1,10,10)}
RS = RandomizedSearchCV(LR,params,return_train_score=True,error_score=0,random_state=0)

RS.fit(TrainData,TrainTarget)

Result = pd.DataFrame(RS.cv_results_)
print RS.best_params_

Upvotes: 0

Views: 1682

Answers (1)

Vivek Kumar
Vivek Kumar

Reputation: 36599

You are correctly setting the random_state to LogisticRegression and RandomizedSearchCV. But there's one more source which can change the train test data and thats when you generate the params using np.random. This is changed on each run.

For controlling this behaviour, you can set the numpy.random.seed() to an integer of your choice. Something like this on top of your code:

np.random.seed(0)

Note: Doing this will also set seed for all scikit modules, because internally scikit uses this. So you may not need to set random_state everywhere in that case, but its not recommended.

See this answer - Should I use `random.seed` or `numpy.random.seed` to control random number generation in `scikit-learn`?.

You may want to check these resources also:

Upvotes: 2

Related Questions