Reputation: 1
I want to use LogisticRegression
to classify. So, I use RandomizedSearchCV
to pick best C params in LogisticRegression
.
My question is: Why do best_params_
change every time I run this program? I assume that best_params_
should always stay the same.
Code as follows:
data = load_iris().data
target = load_iris().target
# DATA Split
TrainData , TestData ,TrainTarget , TestTarget = train_test_split(data,target,test_size=0.25,random_state=0)
assert len(TrainData)==len(TrainTarget)
Skf = StratifiedKFold(n_splits=5)
#Model
LR = LogisticRegression(C=10,multi_class='multinomial',penalty='l2',solver='sag',max_iter=10000,random_state=0)
#Params selection with Cross Validation
params = {'C':np.random.randint(1,10,10)}
RS = RandomizedSearchCV(LR,params,return_train_score=True,error_score=0,random_state=0)
RS.fit(TrainData,TrainTarget)
Result = pd.DataFrame(RS.cv_results_)
print RS.best_params_
Upvotes: 0
Views: 1682
Reputation: 36599
You are correctly setting the random_state to LogisticRegression and RandomizedSearchCV. But there's one more source which can change the train test data and thats when you generate the params
using np.random
. This is changed on each run.
For controlling this behaviour, you can set the numpy.random.seed()
to an integer of your choice. Something like this on top of your code:
np.random.seed(0)
Note: Doing this will also set seed for all scikit modules, because internally scikit uses this. So you may not need to set random_state everywhere in that case, but its not recommended.
See this answer - Should I use `random.seed` or `numpy.random.seed` to control random number generation in `scikit-learn`?.
You may want to check these resources also:
Upvotes: 2