I get different results every time, how can I make my code reproducible?

Question

I used Extreme Learning Machine (ELM) algorithm. And I have two files one for training dataset and one for testing dataset, and I normalize my data. I get different results every time, how can I fix my result?

My code:

import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from scipy.linalg import pinv2

#import dataset
train = pd.read_excel('INRStrai.xlsx')
test = pd.read_excel('INRStes.xlsx')

#scaler data
scaler = MinMaxScaler()
X_train = scaler.fit_transform(train.values[:,1:])
y_train = scaler.fit_transform(train.values[:,:1])
X_test = scaler.fit_transform(test.values[:,1:])
y_test = scaler.fit_transform(test.values[:,:1])

#input size
input_size = X_train.shape[1]

#Number of neurons
hidden_size = 300

#weights & biases
input_weights = np.random.normal(size=[input_size,hidden_size])
biases = np.random.normal(size=[hidden_size])

#Activation Function
def relu(x):
   return np.maximum(x, 0, x)

#Calculations
def hidden_nodes(X):
    G = np.dot(X, input_weights)
    G = G + biases
    H = relu(G)
    return H

#Output weights 
output_weights = np.dot(pinv2(hidden_nodes(X_train)), y_train)

#Def prediction
def predict(X):
    out = hidden_nodes(X)
    out = np.dot(out, output_weights)
    return out

#PREDICTION
prediction = predict(X_test)

desertnaut · Accepted Answer

The only source of randomness in the code you show here is the initialization of your weights & biases:

#weights & biases
input_weights = np.random.normal(size=[input_size,hidden_size])
biases = np.random.normal(size=[hidden_size])

So, the only thing you should do in order to make this code reproducible is to explicitly set the Numpy random seed before initializing your weights & biases:

seed = 42 # can be any number, and the exact value does not matter
np.random.seed(seed)
#weights & biases
input_weights = np.random.normal(size=[input_size,hidden_size])
biases = np.random.normal(size=[hidden_size])

Irrelevant to your issue, but we never use fit_transform (or anything including fit, for that matter) to test data - use simply transform instead; you should also use 2 different scalers for features X and labels y:

#scale data
scaler_X = MinMaxScaler()
scaler_Y = MinMaxScaler()
# fit_transform for training data:
X_train = scaler_X.fit_transform(train.values[:,1:])
y_train = scaler_Y.fit_transform(train.values[:,:1])
# only transform for test (unseen) data:
X_test = scaler_X.transform(test.values[:,1:])
y_test = scaler_Y.transform(test.values[:,:1])

I get different results every time, how can I make my code reproducible?

Answers (1)

Related Questions