Reputation: 63
I am trying to generate different stratified splits of my data set using stratifiedkfold split and random_state parameter. However, when I use different random_state values, I still get the same splits. My understanding is that by using different random_state values, you will be able to generate different splits. Please let me know what I am doing incorrectly. Here is the code.
import numpy as np
X_train=np.ones(10)
Y_train=np.ones(10)
from sklearn.model_selection import StratifiedKFold
skf = StratifiedKFold(n_splits=5,random_state=0)
skf1 = StratifiedKFold(n_splits=5,random_state=100)
trn1=[]
cv1=[]
for train, cv in skf.split(X_train, Y_train):
trn1=trn1+[train]
cv1=cv1+[cv]
trn2=[]
cv2=[]
for train, cv in skf1.split(X_train, Y_train):
trn2=trn2+[train]
cv2=cv2+[cv]
for c in list(range(0,5)):
print('Fold:'+str(c+1))
print(trn1[c])
print(trn2[c])
print(cv1[c])
print(cv2[c])
Here is the output
Fold:1
[2 3 4 5 6 7 8 9]
[2 3 4 5 6 7 8 9]
[0 1]
[0 1]
Fold:2
[0 1 4 5 6 7 8 9]
[0 1 4 5 6 7 8 9]
[2 3]
[2 3]
Fold:3
[0 1 2 3 6 7 8 9]
[0 1 2 3 6 7 8 9]
[4 5]
[4 5]
Fold:4
[0 1 2 3 4 5 8 9]
[0 1 2 3 4 5 8 9]
[6 7]
[6 7]
Fold:5
[0 1 2 3 4 5 6 7]
[0 1 2 3 4 5 6 7]
[8 9]
[8 9]
Upvotes: 4
Views: 2012
Reputation: 16079
As stated in the documentation:
random_state : int, RandomState instance or None, optional, default=None
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. Used when shuffle == True.
So simply add shuffle=True
to your StratifiedKFold
calls. For example:
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=0)
skf1 = StratifiedKFold(n_splits=5, shuffle=True, random_state=100)
Output:
Fold:1
[0 1 3 4 5 6 7 9]
[0 1 2 3 4 5 8 9]
[2 8]
[6 7]
Fold:2
[0 1 2 3 5 6 7 8]
[0 2 3 4 6 7 8 9]
[4 9]
[1 5]
Fold:3
[0 2 3 4 5 7 8 9]
[0 1 3 5 6 7 8 9]
[1 6]
[2 4]
Fold:4
[0 1 2 4 5 6 8 9]
[1 2 4 5 6 7 8 9]
[3 7]
[0 3]
Fold:5
[1 2 3 4 6 7 8 9]
[0 1 2 3 4 5 6 7]
[0 5]
[8 9]
Upvotes: 4