Tavakoli
Tavakoli

Reputation: 1375

Use of 'random_state' parameter in sklearn.utils.shuffle?

what is the random_state parameter in shuffle in sklearn.utils? any one can explain random_state with some sample?

Upvotes: 8

Views: 17034

Answers (2)

ihebiheb
ihebiheb

Reputation: 5183

Besides the cases that @Abhinav gave, the random_state can be useful in this situation: Imagine you have 2 nparrays/dataframes..., and you need to shuffle their rows the same way (For example, the first row in both arrays will be the 20th, the 2nd will be the 5th...)

You can do it by keeping the same random_state in both statements:

array1_shuffled = sklearn.utils.shuffle(array1, random_state=42)
array2_shuffled = sklearn.utils.shuffle(array2, random_state=42)

Upvotes: 2

Abhinav Arora
Abhinav Arora

Reputation: 3391

The shuffle is used to shuffle your matrices randomly. Programmatically, random sequences are generated using a seed number. You are guaranteed to have the same random sequence if you use the same seed. The random_state parameter allows you to provide this random seed to sklearn methods. This is useful because it allows you to reproduce the randomness for your development and testing purposes. So, in the shuffle method, if I use the same random_state with the same dataset, then I am always guaranteed to have the same shuffle. Consider the following example:

X = np.array([[1., 0.], [2., 1.], [0., 0.]])
X = shuffle(X, random_state=20)

If this gives me the following output,

array([[ 0.,  0.],
      [ 2.,  1.],
      [ 1.,  0.]])

Now, I am always guaranteed that if I use the random_state = 20, I will always get exactly the same shuffling. This si particularly useful for unit tests, where you would like to have reproducible results for asserting your conditions being tested.

Hope that helps!

Upvotes: 21

Related Questions