Simple linear regression in Tensorflow produces near zero coefficient

Question

I am attempting a simple linear regression in Tensorflow with only one independent variable. A plot of my data shows the coefficient should be near 1, and in fact if I run it using sklearn.linear_model.LinearRegression I get a sensible result of about 0.90.

However running it in Tensorflow using this tutorial produces a coefficient of very near zero. I was able to get a rational result from the Tensorflow using randomized numbers. I have tried adjusting the learning rate or number of epochs without any meaningful effect.

The MRE includes actual data, and should produce a coefficient of 0.8975 from sklearn but 0.00045 from Tensorflow. I have considered that it is getting caught at a local minimum, but none of the examples I can find of such a problem work for my issue.

import numpy as np
import tensorflow as tf
from sklearn import linear_model

learning_rate = 0.1
epochs = 100

x_train = np.array([-0.00055, 0.00509, -0.0046, -0.01687, -0.0047, 0.00348, 
                0.00042, -0.00208, -0.01207, -0.0007, 0.00408, -0.00182, 
                -0.00294, -0.00113, 0.0038, -0.00645, 0.00113, 0.00268, 
                -0.0045, -0.00381, 0.00298, 0, -0.00184, -0.00212, 
                -0.00213, -0.01224, 0.00072, 0, -0.00331, 0.00534, 
                0.00675, -0.00285, -0.00429, 0.00489, -0.00286, 0.00158, 
                0.00129, 0.00472, 0.00555, -0.00467, -0.00231, -0.00231, 
                0.00159, -0.00463, 0.00174, 0, -0.0029, 
                -0.00349, 0.01372, -0.00302])

y_train = np.array([0.00125, 0.00218, -0.00373, -0.00999, -0.00441, 
                0.00412, 0.00158, -0.00094, -0.01513, -0.00064, 0.00416, 
                -0.00191, -0.00607, 0.00161, 0.00289, -0.00416, 
                0.00096, 0.00321, -0.00672, -0.0029, 0.00129, -0.00032, 
                -0.00387, -0.00162, -0.00292, -0.01367, 0.00198, 
                0.00099, -0.00329, 0.00693, 0.00459, -0.00294, -0.00164, 
                0.00328, -0.00425, 0.00131, 0.00131, 0.00524, 0.00358,
                -0.00422, -0.00065, -0.00359, 0.00229, 0, 0.00196, 
                -0.00065, -0.00391, -0.0108, 0.01291, -0.00098])

regr = linear_model.LinearRegression()
regr.fit(x_train.reshape(-1, 1), y_train.reshape(-1, 1))
print ('Coefficients: ', regr.coef_)

weight = tf.Variable(0.)
bias = tf.Variable(0.)

for e in range(epochs):
    with tf.GradientTape() as tape:
        y_pred = weight*x_train + bias
        loss = tf.reduce_mean(tf.square(y_pred - y_train))
        gradients = tape.gradient(loss, [weight,bias])
        weight.assign_sub(gradients[0]*learning_rate)
        bias.assign_sub(gradients[1]*learning_rate)

print(weight.numpy(), 'weight', bias.numpy(), 'bias')

Ahmed AEK · Accepted Answer

in the posted example, the training dataset x and y values are very small, which causes gradients to be very small, so while the model is training correctly on the data, it might take a few million iterations,

the scikit learn linear regression model uses least squares curve fitting so it can fit the dataset infinitely fast.

a suggestions to bring the result down to a managable 1000 iterations is to apply MinMaxScaler to have the x and y dataset between 0 and 1, which will improve gradients and reach a trained model, however you should inverse transform the results back after training, as shown in the modified code below.

    import numpy as np
    import tensorflow as tf
    from sklearn import linear_model
    from sklearn.preprocessing import MinMaxScaler
    import matplotlib.pyplot as plt
    learning_rate = 0.1
    epochs = 1000
    
    x_train0 = np.array([-0.00055, 0.00509, -0.0046, -0.01687, -0.0047, 0.00348,
                    0.00042, -0.00208, -0.01207, -0.0007, 0.00408, -0.00182,
                    -0.00294, -0.00113, 0.0038, -0.00645, 0.00113, 0.00268,
                    -0.0045, -0.00381, 0.00298, 0, -0.00184, -0.00212,
                    -0.00213, -0.01224, 0.00072, 0, -0.00331, 0.00534,
                    0.00675, -0.00285, -0.00429, 0.00489, -0.00286, 0.00158,
                    0.00129, 0.00472, 0.00555, -0.00467, -0.00231, -0.00231,
                    0.00159, -0.00463, 0.00174, 0, -0.0029,
                    -0.00349, 0.01372, -0.00302])
    scaler1 = MinMaxScaler()
    x_train = scaler1.fit_transform(x_train0.reshape(-1,1))
    y_train0 = np.array([0.00125, 0.00218, -0.00373, -0.00999, -0.00441,
                    0.00412, 0.00158, -0.00094, -0.01513, -0.00064, 0.00416,
                    -0.00191, -0.00607, 0.00161, 0.00289, -0.00416,
                    0.00096, 0.00321, -0.00672, -0.0029, 0.00129, -0.00032,
                    -0.00387, -0.00162, -0.00292, -0.01367, 0.00198,
                    0.00099, -0.00329, 0.00693, 0.00459, -0.00294, -0.00164,
                    0.00328, -0.00425, 0.00131, 0.00131, 0.00524, 0.00358,
                    -0.00422, -0.00065, -0.00359, 0.00229, 0, 0.00196,
                    -0.00065, -0.00391, -0.0108, 0.01291, -0.00098])
    scaler2 = MinMaxScaler()
    y_train = scaler2.fit_transform(y_train0.reshape(-1,1))
    
    regr = linear_model.LinearRegression()
    regr.fit(x_train.reshape(-1, 1), y_train.reshape(-1, 1))
    print ('Coefficients: ', regr.coef_, ' intercept ',regr.intercept_, )
    
    weight = tf.Variable(0.)
    bias = tf.Variable(0.)
    
    for e in range(epochs):
        with tf.GradientTape() as tape:
            y_pred = weight*x_train + bias
            loss = tf.reduce_mean(tf.square(y_pred - y_train))
            gradients = tape.gradient(loss, [weight,bias])
            weight.assign_sub(gradients[0]*learning_rate)
            bias.assign_sub(gradients[1]*learning_rate)
    
    
    print(weight.numpy(), 'weight', bias.numpy(), 'bias')
    
    import matplotlib.pyplot as plt
    plt.plot(x_train0,scaler2.inverse_transform(y_pred.numpy()).flatten(),'r',label='model output')
    plt.scatter(x_train0,y_train0,label='training dataset')
    plt.legend()
    plt.show()

Coefficients: [[0.97913471]] intercept [-0.00420121]

0.96772194 weight 0.0018798028 bias

Simple linear regression in Tensorflow produces near zero coefficient

Answers (1)

Related Questions