Reputation: 136359
I want to get reproducible samples of data. A quick experiment suggests, that numpy.random.seed
does influence pandas.DataFrame.sample
, but it is not documented.
Does anybody know
I ran the following a couple of times and always got the same results back
#!/usr/bin/env python
import pandas as pd
import numpy as np
df = pd.DataFrame([(1, 2, 1),
(1, 2, 2),
(1, 2, 3),
(4, 1, 612),
(4, 1, 612),
(4, 1, 1),
(3, 2, 1),
],
columns=['groupid', 'a', 'b'],
index=['India', 'France', 'England', 'Germany', 'UK', 'USA',
'Indonesia'])
np.random.seed(0)
print(df.sample(n=1))
print(df.sample(n=1))
print(df.sample(n=1))
print(df.sample(n=1))
print(df.sample(n=1))
Which gives:
Upvotes: 2
Views: 1551
Reputation: 862661
pandas use _random_state
function for return np.random.RandomState
link:
def _random_state(state=None):
"""
Helper function for processing random_state arguments.
Parameters
----------
state : int, np.random.RandomState, None.
If receives an int, passes to np.random.RandomState() as seed.
If receives an np.random.RandomState object, just returns object.
If receives `None`, returns np.random.
If receives anything else, raises an informative ValueError.
Default None.
Returns
-------
np.random.RandomState
"""
if types.is_integer(state):
return np.random.RandomState(state)
elif isinstance(state, np.random.RandomState):
return state
elif state is None:
return np.random
else:
raise ValueError("random_state must be an integer, a numpy "
"RandomState, or None")
and in sample is called this function:
# Process random_state argument
rs = com._random_state(random_state)
Upvotes: 1