Reputation: 1375
what is the random_state parameter in shuffle
in sklearn.utils
? any one can explain random_state with some sample?
Upvotes: 8
Views: 17034
Reputation: 5183
Besides the cases that @Abhinav gave, the random_state can be useful in this situation: Imagine you have 2 nparrays/dataframes..., and you need to shuffle their rows the same way (For example, the first row in both arrays will be the 20th, the 2nd will be the 5th...)
You can do it by keeping the same random_state in both statements:
array1_shuffled = sklearn.utils.shuffle(array1, random_state=42)
array2_shuffled = sklearn.utils.shuffle(array2, random_state=42)
Upvotes: 2
Reputation: 3391
The shuffle
is used to shuffle your matrices randomly. Programmatically, random sequences are generated using a seed number. You are guaranteed to have the same random sequence if you use the same seed. The random_state
parameter allows you to provide this random seed to sklearn methods. This is useful because it allows you to reproduce the randomness for your development and testing purposes. So, in the shuffle
method, if I use the same random_state
with the same dataset, then I am always guaranteed to have the same shuffle. Consider the following example:
X = np.array([[1., 0.], [2., 1.], [0., 0.]])
X = shuffle(X, random_state=20)
If this gives me the following output,
array([[ 0., 0.],
[ 2., 1.],
[ 1., 0.]])
Now, I am always guaranteed that if I use the random_state = 20
, I will always get exactly the same shuffling. This si particularly useful for unit tests, where you would like to have reproducible results for asserting your conditions being tested.
Hope that helps!
Upvotes: 21