random_state maintained when running script again?

Question

Suppose I have a program, called script.py:

import pandas as pd
import numpy as np
from sklearn.cross_validation import train_test_split

if __name__ == "__main__":
    df = pd.DataFrame({"x": [1,2,3,4,5,6,6,5,6,3], "y": [1,1,0,0,0,0,1,0,0,1]})

    train, test = train_test_split(df, test_size = 0.20, random_state = 100)

If I run this script from my command line once:

H:\>python script.py

How can I ensure that the train and test dataframes in subsequent runs (i.e. when I run script.py again) are identical to the train and test dataframes from previous iterations? I know the random_state works if you don't leave the console, but would the equality of these train and test sets be preserved if I came back tomorrow, turned my PC back on, and re-ran script.py?

I am testing the accuracies of different machine learning algorithms, all stored in different scripts, which is why I want to make sure the train and test sets are identical across scripts.

random_state maintained when running script again?

Answers (1)

Related Questions