Billy P. Witt
Billy P. Witt

Reputation: 21

Tensorflow variable value different on same training set

I build a neural network model on Python 3.6

I'm trying to predict price of condominium based on their attributes such as lat, lng, distance to public transport, year-built, and so on.

I use the same training set for the model. However, each time I print out value of the variables in hidden layer is different.

    testing_df_w_price = testing_df.copy()
    testing_df.drop('PricePerSq',axis = 1, inplace = True)
    training_df, testing_df = training_df.drop(['POID'], axis=1), testing_df.drop(['POID'], axis=1)

    col_train = list(training_df.columns)
    col_train_bis = list(training_df.columns)

    col_train_bis.remove('PricePerSq')
    mat_train = np.matrix(training_df)
    mat_test  = np.matrix(testing_df)
    mat_new = np.matrix(training_df.drop('PricePerSq', axis = 1))
    mat_y = np.array(training_df.PricePerSq).reshape((training_df.shape[0],1))

    prepro_y = MinMaxScaler()
    prepro_y.fit(mat_y)

    prepro = MinMaxScaler()
    prepro.fit(mat_train)

    prepro_test = MinMaxScaler()
    prepro_test.fit(mat_new)

    train = pd.DataFrame(prepro.transform(mat_train),columns = col_train)
    test  = pd.DataFrame(prepro_test.transform(mat_test),columns = col_train_bis)

    # List of features
    COLUMNS = col_train
    FEATURES = col_train_bis
    LABEL = "PricePerSq"

    # Columns for tensorflow
    feature_cols = [tf.contrib.layers.real_valued_column(k) for k in FEATURES]

    # Training set and Prediction set with the features to predict
    training_set = train[COLUMNS]
    prediction_set = train.PricePerSq

    # Train and Test
    x_train, x_test, y_train, y_test = train_test_split(training_set[FEATURES] , prediction_set, test_size=0.25, random_state=42)

    y_train = pd.DataFrame(y_train, columns = [LABEL])

    training_set = pd.DataFrame(x_train, columns = FEATURES).merge(y_train, left_index = True, right_index = True) # good

    # Training for submission
    training_sub = training_set[col_train] # good

    # Same thing but for the test set
    y_test = pd.DataFrame(y_test, columns = [LABEL])
    testing_set = pd.DataFrame(x_test, columns = FEATURES).merge(y_test, left_index = True, right_index = True) # good

    # Model
    # tf.logging.set_verbosity(tf.logging.INFO)
    tf.logging.set_verbosity(tf.logging.ERROR)
    regressor = tf.contrib.learn.DNNRegressor(feature_columns=feature_cols,
                                              hidden_units=[int(len(col_train)+1/2)],
                                              model_dir = "/tmp/tf_model")
    for k in regressor.get_variable_names():
        print(k)
        print(regressor.get_variable_value(k))

Example of hidden layer value difference

Upvotes: 2

Views: 59

Answers (2)

ITiger
ITiger

Reputation: 1081

In Machine Learnining, the current "knowledge state" of your neural network is expressed through the weights of the connections in your graph. Generally considered, your whole network represents a high-dimensional function and the task of learning means finding the global optimum of this funktion. The learning process changes the weights of the connections in your neural network according to the specified optimizer, which in your case is the default of tf.contrib.learn.DNNRegressor (which is the Adagrad optimizer). But there are other parameters that affect the final "knowledge state" in your model. There are for instance (and i guarantee no completeness in the following list):

  • The initial learning rate in your model
  • The learning rate schedule that adapts the learning rate over time
  • eventually defined regularities and early stopping
  • The initialization strategy used for weight initialization (e.g. He-initialization or random initialization)

Plus (and this is maybe the most important thing to understand why your weights are different after each retraining), you have to consider that you use a stochastic gradient descent algorithm during training. This means, that for each optimization step the algorithm choses a random subset of your whole training set. Therefore, one optimization step doesn't always point tho the global optimum of your high-dimensional function, but to the steepest descent that could be computed with the randomly chosen subset. Because of this stochastic component in the optimization process, you will likely never reach the global optimum for your task. But with carefully chosen hyperparameters (and of course good data) you will reach a good approximate solution, which lies whithin a local optimum of the function and which can change everytime you retrain the model.

So to conclude, don't look at the weights to judge the performance of your model, because they will be slightly different each time. Use a performance measure like the accuracy computed in a cross validation or a confusion matrix computed on the test set.

P.S. tf.contrib.learn.DNNRegressor is a deprecated function in the newest TensorFlow release, as you can see in the docs. Use tf.estimator.DNNRegressor instead.

Upvotes: 0

user2653663
user2653663

Reputation: 2948

The variables are initialized with random values when you construct the network. Since there's likely to be many local minima of your loss function, the fitted parameters will change every time you run the network. In addition if your loss function is convex (only one (global) minima) the order of the variables is somewhat arbitrary. If for example you fit a network with 1 hidden layers with 2 hidden nodes, the parameters of node 1 in your first run might correspond to the parameters of node 2 and vice versa.

Upvotes: 1

Related Questions