enigma6174
enigma6174

Reputation: 360

What is the error in the iterative implementation of gradient descent algorithm below?

I have attempted to implement the iterative version of gradient descent algorithm which however is not working correctly. The vectorized implementation of the same algorithm however works correctly.
Here is the iterative implementation :

function [theta] = gradientDescent_i(X, y, theta, alpha, iterations)

    % get the number of rows and columns
    nrows = size(X, 1);
    ncols = size(X, 2);

    % initialize the hypothesis vector
    h = zeros(nrows, 1);

    % initialize the temporary theta vector
    theta_temp = zeros(ncols, 1);

    % run gradient descent for the specified number of iterations
    count = 1;

    while count <= iterations

        % calculate the hypothesis values and fill into the vector
        for i = 1 : nrows
            for j = 1 : ncols
                term = theta(j) * X(i, j);
                h(i) = h(i) + term;
            end
        end

        % calculate the gradient
        for j = 1 : ncols
            for i = 1 : nrows
                term = (h(i) - y(i)) * X(i, j);
                theta_temp(j) = theta_temp(j) + term;
            end
        end

        % update the gradient with the factor
        fact = alpha / nrows;

        for i = 1 : ncols
            theta_temp(i) = fact * theta_temp(i);
        end

        % update the theta
        for i = 1 : ncols
            theta(i) = theta(i) - theta_temp(i);
        end

        % update the count
        count += 1;
    end
end

And below is the vectorized implementation of the same algorithm :

function [theta, theta_all, J_cost] = gradientDescent(X, y, theta, alpha)

    % set the learning rate
    learn_rate = alpha;

    % set the number of iterations
    n = 1500;

    % number of training examples
    m = length(y);

    % initialize the theta_new vector
    l = length(theta);
    theta_new = zeros(l,1);

    % initialize the cost vector
    J_cost = zeros(n,1);

    % initialize the vector to store all the calculated theta values
    theta_all = zeros(n,2);

    % perform gradient descent for the specified number of iterations
    for i = 1 : n

        % calculate the hypothesis
        hypothesis = X * theta;

        % calculate the error
        err = hypothesis - y;

        % calculate the gradient
        grad = X' * err;

        % calculate the new theta
        theta_new = (learn_rate/m) .* grad;

        % update the old theta
        theta = theta - theta_new;

        % update the cost
        J_cost(i) = computeCost(X, y, theta);

        % store the calculated theta value
        if i < n
            index = i + 1;
            theta_all(index,:) = theta';
    end
end

Link to the dataset can be found here

The filename is ex1data1.txt

ISSUES

For initial theta = [0, 0] (this is a vector!), learning rate of 0.01 and running this for 1500 iterations I get the optimal theta as :

  1. theta0 = -3.6303
  2. theta1 = 1.1664

The above is the output for the vectorized implementation which I know I have implemented correctly (it passed all the test cases on Coursera).

However, when I implemented the same algorithm using the iterative method (1st code I mentioned) the theta values I get are (alpha = 0.01, iterations = 1500):

  1. theta0 = -0.20720
  2. theta1 = -0.77392

This implementation fails to pass the test cases and I know therefore that the implementation is incorrect.

I am however unable to understand where I am going wrong as the iterative code does the same job, same multiplications as the vectorized one and when I tried to trace the output of 1 iteration of both the codes, the values came same (on pen and paper!) but failed when I ran them on Octave.

Any help regarding this would be of great help especially if you could point out where I went wrong and what exactly was the cause of failure.

Points to consider

  1. The implementation of hypothesis is correct as I tested it out and both the codes gave the same results, so no issues here.
  2. I printed the output of the gradient vector in both the codes and realised that the error lies here because the outputs here were very different!

Additionally, here is the code for pre-processing the data :

function[X, y] = fileReader(filename)

    % load the dataset
    dataset = load(filename);

    % get the dimensions of the dataset
    nrows = size(dataset, 1);
    ncols = size(dataset, 2);

    % generate the X matrix from the dataset
    X = dataset(:, 1 : ncols - 1);

    % generate the y vector
    y = dataset(:, ncols);

    % append 1's to the X matrix
    X = [ones(nrows, 1), X];
end

Upvotes: 0

Views: 216

Answers (1)

Nihal Kudligi
Nihal Kudligi

Reputation: 26

What is going wrong with the first code is that the theta_temp and the h vectors are not being initialised properly. For the very first iteration (when count value equals 1) your code runs properly because for that particular iteration the the h and the theta_temp vectors have been initialised to 0 properly. However, since these are temporary vectors for each iteration of gradient descent, they have not been initialised to 0 vectors again for the subsequent iterations. That is, for iteration 2, the values that are modified into h(i) and theta_temp(i) are just added to the old values. Hence because of that, the code does not work properly. You need to update the vectors as zero vectors at the beginning of each iteration and then they would work correctly. Here is my implementation of your code (the first one, observe the changes) :

function [theta] = gradientDescent_i(X, y, theta, alpha, iterations)

    % get the number of rows and columns
    nrows = size(X, 1);
    ncols = size(X, 2);

    % run gradient descent for the specified number of iterations
    count = 1;

    while count <= iterations

        % initialize the hypothesis vector
        h = zeros(nrows, 1);

        % initialize the temporary theta vector
        theta_temp = zeros(ncols, 1);


        % calculate the hypothesis values and fill into the vector
        for i = 1 : nrows
            for j = 1 : ncols
                term = theta(j) * X(i, j);
                h(i) = h(i) + term;
            end
        end

        % calculate the gradient
        for j = 1 : ncols
            for i = 1 : nrows
                term = (h(i) - y(i)) * X(i, j);
                theta_temp(j) = theta_temp(j) + term;
            end
        end

        % update the gradient with the factor
        fact = alpha / nrows;

        for i = 1 : ncols
            theta_temp(i) = fact * theta_temp(i);
        end

        % update the theta
        for i = 1 : ncols
            theta(i) = theta(i) - theta_temp(i);
        end

        % update the count
        count += 1;
    end
end

I ran the code and it gave the same values of theta which you have mentioned. However, what I wonder is how did you state that the output of hypothesis vector was the same in both cases where clearly, this was one of the reasons for the first code failing!

Upvotes: 1

Related Questions