Reputation: 4723
I have a list of four strings. Then in a Pandas dataframe I want to create a variable randomly selecting a value from this list and assign into each row. I am using numpy's random choice, but reading their documentation, there is no seed option. How can I specify the random seed to the random assignment so every time the random assignment will be the same?
service_code_options = ['899.59O', '12.42R', '13.59P', '204.68L']
df['SERVICE_CODE'] = [np.random.choice(service_code_options ) for i in df.index]
Upvotes: 15
Views: 21177
Reputation: 2253
According to the notes of numpy.random.seed
in numpy v1.2.4:
Best practice is to use a dedicated Generator instance rather than the random variate generation methods exposed directly in the random module.
Such a Generator is constructed using np.random.default_rng
.
Thus, instead of np.random.seed
, the current best practice is to use a np.random.default_rng
with a seed to construct a Generator, which can be further used for reproducible results.
Combining jezrael's answer and the current best practice, we have:
import pandas as pd
import numpy as np
rng = np.random.default_rng(seed=121)
df = pd.DataFrame({'a':range(10)})
service_code_options = ['899.59O', '12.42R', '13.59P', '204.68L']
df['SERVICE_CODE'] = rng.choice(service_code_options, size=len(df))
print(df)
a SERVICE_CODE
0 0 12.42R
1 1 13.59P
2 2 12.42R
3 3 12.42R
4 4 899.59O
5 5 204.68L
6 6 204.68L
7 7 13.59P
8 8 12.42R
9 9 13.59P
Upvotes: 6
Reputation: 862681
You need define it before by numpy.random.seed
, also list comprehension is not necessary, because is possible use numpy.random.choice
with parameter size
:
np.random.seed(123)
df = pd.DataFrame({'a':range(10)})
service_code_options = ['899.59O', '12.42R', '13.59P', '204.68L']
df['SERVICE_CODE'] = np.random.choice(service_code_options, size=len(df))
print (df)
a SERVICE_CODE
0 0 13.59P
1 1 12.42R
2 2 13.59P
3 3 13.59P
4 4 899.59O
5 5 13.59P
6 6 13.59P
7 7 12.42R
8 8 204.68L
9 9 13.59P
Upvotes: 12
Reputation: 294278
Documentation numpy.random.seed
np.random.seed(this_is_my_seed)
That could be an integer or a list of integers
np.random.seed(300)
Or
np.random.seed([3, 1415])
np.random.seed([3, 1415])
service_code_options = ['899.59O', '12.42R', '13.59P', '204.68L']
np.random.choice(service_code_options, 3)
array(['899.59O', '204.68L', '13.59P'], dtype='<U7')
Notice that I passed a 3
to the choice
function to specify the size of the array.
Upvotes: 3