Reputation: 1
I am testing some possibilities around random_state
. Can you explain how random_state = 0
and random_state = numpy.random.RandomState(0)
differ from each other ?
Code
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
import numpy as np
import random
for i in range(5):
########### code for random_state= numpy.random.RandomState(i) ##############
rng = np.random.RandomState(i)
X, y = make_classification(random_state=rng)
rf = RandomForestClassifier(random_state=rng)
X_train, X_test, y_train, y_test = train_test_split(X, y,
random_state=rng)
p1=rf.fit(X_train, y_train).score(X_test, y_test)
########### code for random_state= integer ##############
X, y = make_classification(random_state=i)
rf = RandomForestClassifier(random_state=i)
X_train, X_test, y_train, y_test = train_test_split(X, y,
random_state=i)
p2=rf.fit(X_train, y_train).score(X_test, y_test)
print(i,p1,p2)
Output
0 0.84 0.92
1 1.0 0.92
2 0.88 0.92
3 0.84 0.88
4 1.0 1.0
Upvotes: 0
Views: 917
Reputation: 1435
Setting the random_state = 1
sets a fixed seed (e.g. 1) for the splitting of train/test sets.
Setting the random_state = np.random.RandomState(1)
will set the seed as a random variable with seed 1. At each iteration, np.random.RandomState
instance will change randomly each time, splitting in a non repeatable way the sets.
Use a normal integer as random_state
if you want repeatable splits, or use nothing to have random splits.
Using RandomState
makes sense only if you want to split randomly your sets according to some particular distribution (with fixed seed). See the official numpy docs about it
Upvotes: 1