boot-scootin
boot-scootin

Reputation: 12515

random_state maintained when running script again?

Suppose I have a program, called script.py:

import pandas as pd
import numpy as np
from sklearn.cross_validation import train_test_split

if __name__ == "__main__":
    df = pd.DataFrame({"x": [1,2,3,4,5,6,6,5,6,3], "y": [1,1,0,0,0,0,1,0,0,1]})

    train, test = train_test_split(df, test_size = 0.20, random_state = 100)

If I run this script from my command line once:

H:\>python script.py

How can I ensure that the train and test dataframes in subsequent runs (i.e. when I run script.py again) are identical to the train and test dataframes from previous iterations? I know the random_state works if you don't leave the console, but would the equality of these train and test sets be preserved if I came back tomorrow, turned my PC back on, and re-ran script.py?

I am testing the accuracies of different machine learning algorithms, all stored in different scripts, which is why I want to make sure the train and test sets are identical across scripts.

Upvotes: 0

Views: 294

Answers (1)

lejlot
lejlot

Reputation: 66805

Random state how nothing to do with when you run your code. The whole concept of specifing random state is to have exactly the same results every time you run this code with the same parameters. So as long as you do not change df, test_size and random_state, this function will always return the same values, no matter how many days pass. It might, however, change if you update the underlying library.

Upvotes: 1

Related Questions