Reputation: 12515
Suppose I have a program, called script.py
:
import pandas as pd
import numpy as np
from sklearn.cross_validation import train_test_split
if __name__ == "__main__":
df = pd.DataFrame({"x": [1,2,3,4,5,6,6,5,6,3], "y": [1,1,0,0,0,0,1,0,0,1]})
train, test = train_test_split(df, test_size = 0.20, random_state = 100)
If I run this script from my command line once:
H:\>python script.py
How can I ensure that the train
and test
dataframes in subsequent runs (i.e. when I run script.py
again) are identical to the train
and test
dataframes from previous iterations? I know the random_state
works if you don't leave the console, but would the equality of these train
and test
sets be preserved if I came back tomorrow, turned my PC back on, and re-ran script.py
?
I am testing the accuracies of different machine learning algorithms, all stored in different scripts, which is why I want to make sure the train and test sets are identical across scripts.
Upvotes: 0
Views: 294
Reputation: 66805
Random state how nothing to do with when you run your code. The whole concept of specifing random state is to have exactly the same results every time you run this code with the same parameters. So as long as you do not change df, test_size and random_state, this function will always return the same values, no matter how many days pass. It might, however, change if you update the underlying library.
Upvotes: 1