
Reputation: 533

I can't get my tensorflow gradient descent linear regression algorithm to work

I'm trying to write a simple tensorflow linear regression model that takes a subset of the boston housing data, specifically the number of rooms (RM) column as the independent variable and the median price (MEDV) as the dependent variable and apply a gradient descent algorithm to it.

However, when I run it, the optimizer doesn't seem to work. The cost never decreases and the weight actually increments in the wrong direction.

Here are the various plots that I constructed

  1. Scatter plot of x and y

  2. PCA analysis plot

  3. Original data fit

  4. Testing data fit.

The images are here:

The output of my program looks like this:

Epoch: 0050 cost= 6393135366144.000000000 W = 110392.0 b = 456112.0

Epoch: 0100 cost= 6418308005888.000000000 W = 111131.0 b = 459181.0

Epoch: 0150 cost= 6418496225280.000000000 W = 111136.0 b = 459203.0

Epoch: 0200 cost= 6418497798144.000000000 W = 111136.0 b = 459203.0


Epoch: 1000 cost= 6418497798144.000000000 W = 111136.0 b = 459203.0

Note that the cost doesn't ever decrease and in fact, the weight increases slightly when it should decrease.

I have no idea why this is happening. The data seems to be reasonable, as far as I can tell and am at a loss to figure out why the optimizer isn't working. The code itself is just a standard tensorflow linear regression example that I pulled off internet and modified for my data set.

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.mlab import PCA
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import tensorflow as tf
import sys
from sklearn import model_selection
from sklearn import preprocessing

def pca(dataset):

    results = PCA(dataset)
    x = []
    y = []

    for item in results.Y:

    fig1 = plt.figure()
    pltData = [x,y]
    xAxisLine = ((min(pltData[0]),max(pltData[0])),(0,0),(0,0))
    yAxisLine = ((min(pltData[1]),max(pltData[1])),(0,0),(0,0))

rng = np.random
# learning_rate is the alpha value that we pass to the gradient descent algorithm. 
learning_rate = 0.1

# How many cycles we're going to run to try and get our optimum fit. 
training_epochs = 1000
display_step =  50

# We're going to pull in a the csv file and extract the X value (RM) and Y value (MEDV)

boston_dataset = pd.read_csv('data/housing.csv')
label = boston_dataset['MEDV']
features = boston_dataset['RM'].reshape(-1,1)
dataset = np.asarray(boston_dataset['RM'])
dataset = np.column_stack((np.asarray(boston_dataset['RM']),np.asarray(boston_dataset['MEDV'])))


train_X, test_X, train_Y, test_Y = model_selection.train_test_split(features, label, test_size = 0.33, 
                                 random_state = 5)

scaler =  preprocessing.StandardScaler()
train_X = scaler.fit_transform(train_X)
# This is the total number of data samples that we're going to run through. 
n_samples = train_X.shape[0]

# Variable placeholders. 
X = tf.placeholder('float')
Y = tf.placeholder('float')

W = tf.Variable(rng.randn(), name = 'weight')
b = tf.Variable(rng.randn(), name = 'bias')

# Here we describe our training model.  It's a linear regression model using the standard y = mx + b 
# point slope formula. We calculate the cost by using least mean squares.

# This is our prediction algorithm: y = mx + b
prediction = tf.add(tf.multiply(X,W),b)

# Let's now calculate the cost of the prediction algorithm using least mean squares

training_cost = tf.reduce_sum(tf.pow(prediction-Y,2))/(2 * n_samples)   
# This is our gradient descent optimizer algorithm.  We're passing in alpha, our learning rate
# and we want the minimum value of the training cost.  
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(training_cost)

init = tf.global_variables_initializer()

# Now we'll run our training data through our model.
with tf.Session() as tf_session:

# Initialize all of our tensorflow variables.

# We'll run the data through for 1000 times (The value of training_epochs). 

    for epoch in range(training_epochs):

# For each training cycle, pass in the x and y values to our optimizer algorithm to calculate the cost.
        for (x,y) in zip(train_X,train_Y):
  ,feed_dict = {X: x, Y: y})

            # For every fifty cycles, let's check and see how we're doing. 
        if (epoch + 1 ) % 50 == 0:
            c =,feed_dict = {X: train_X, Y: train_Y})
            print ('Epoch: ', '%04d' % 
                   (epoch+1),'cost=','{:.9f}'.format(c), \
                   'W = ',, 'b = ',

print ('Optimization finished')
print ('Training cost = ',training_cost,' W = ',, ' b  = ',,'\n')

plt.plot(train_X, train_Y, 'ro',label='Original data')

plt.plot(train_X, * train_X +, label = 'Fitted line')

# We're now going to run test data to see how well our trained model works. 

print ('Testing...(mean square loss comparison)')
testing_cost = - Y, 2)) / (2 * test_Y.shape[0]), feed_dict = {X: test_X, Y: test_Y})
print ('Testing cost = ',testing_cost)
print ('Absolute mean square loss difference: ', abs(training_cost  - testing_cost))

plt.plot(test_X,test_Y,'bo',label='Testing data')

plt.plot(test_X, * test_X +, label = 'Fitted line')

I'm at a real loss to figure out why the optimizer isn't working correctly so if anyone can point me in the right direction, I'd be very grateful.


Upvotes: 1

Views: 355

Answers (1)

Lucas Ramos
Lucas Ramos

Reputation: 449

It might be related to your learning rate. Try reducing it or updating after a few epochs.

For instance, if you're using 100 epochs try setting your learning rate to 0.01 and decreasing it to 0.001 after 30 epochs, and then again to 0.0001 after more 30 or 40 epochs.

You can check common archtectures like AlexNet for the updates in learning rate so you can have an idea..

Good Luck

Upvotes: 1

Related Questions