Reputation: 360
I have attempted to implement the iterative version of gradient descent algorithm which however is not working correctly. The vectorized implementation of the same algorithm however works correctly.
Here is the iterative implementation :
function [theta] = gradientDescent_i(X, y, theta, alpha, iterations)
% get the number of rows and columns
nrows = size(X, 1);
ncols = size(X, 2);
% initialize the hypothesis vector
h = zeros(nrows, 1);
% initialize the temporary theta vector
theta_temp = zeros(ncols, 1);
% run gradient descent for the specified number of iterations
count = 1;
while count <= iterations
% calculate the hypothesis values and fill into the vector
for i = 1 : nrows
for j = 1 : ncols
term = theta(j) * X(i, j);
h(i) = h(i) + term;
end
end
% calculate the gradient
for j = 1 : ncols
for i = 1 : nrows
term = (h(i) - y(i)) * X(i, j);
theta_temp(j) = theta_temp(j) + term;
end
end
% update the gradient with the factor
fact = alpha / nrows;
for i = 1 : ncols
theta_temp(i) = fact * theta_temp(i);
end
% update the theta
for i = 1 : ncols
theta(i) = theta(i) - theta_temp(i);
end
% update the count
count += 1;
end
end
And below is the vectorized implementation of the same algorithm :
function [theta, theta_all, J_cost] = gradientDescent(X, y, theta, alpha)
% set the learning rate
learn_rate = alpha;
% set the number of iterations
n = 1500;
% number of training examples
m = length(y);
% initialize the theta_new vector
l = length(theta);
theta_new = zeros(l,1);
% initialize the cost vector
J_cost = zeros(n,1);
% initialize the vector to store all the calculated theta values
theta_all = zeros(n,2);
% perform gradient descent for the specified number of iterations
for i = 1 : n
% calculate the hypothesis
hypothesis = X * theta;
% calculate the error
err = hypothesis - y;
% calculate the gradient
grad = X' * err;
% calculate the new theta
theta_new = (learn_rate/m) .* grad;
% update the old theta
theta = theta - theta_new;
% update the cost
J_cost(i) = computeCost(X, y, theta);
% store the calculated theta value
if i < n
index = i + 1;
theta_all(index,:) = theta';
end
end
Link to the dataset can be found here
The filename is ex1data1.txt
ISSUES
For initial theta = [0, 0] (this is a vector!), learning rate of 0.01 and running this for 1500 iterations I get the optimal theta as :
The above is the output for the vectorized implementation which I know I have implemented correctly (it passed all the test cases on Coursera).
However, when I implemented the same algorithm using the iterative method (1st code I mentioned) the theta values I get are (alpha = 0.01, iterations = 1500):
This implementation fails to pass the test cases and I know therefore that the implementation is incorrect.
I am however unable to understand where I am going wrong as the iterative code does the same job, same multiplications as the vectorized one and when I tried to trace the output of 1 iteration of both the codes, the values came same (on pen and paper!) but failed when I ran them on Octave.
Any help regarding this would be of great help especially if you could point out where I went wrong and what exactly was the cause of failure.
Points to consider
Additionally, here is the code for pre-processing the data :
function[X, y] = fileReader(filename)
% load the dataset
dataset = load(filename);
% get the dimensions of the dataset
nrows = size(dataset, 1);
ncols = size(dataset, 2);
% generate the X matrix from the dataset
X = dataset(:, 1 : ncols - 1);
% generate the y vector
y = dataset(:, ncols);
% append 1's to the X matrix
X = [ones(nrows, 1), X];
end
Upvotes: 0
Views: 216
Reputation: 26
What is going wrong with the first code is that the theta_temp
and the h
vectors are not being initialised properly. For the very first iteration (when count
value equals 1) your code runs properly because for that particular iteration the the h
and the theta_temp
vectors have been initialised to 0 properly. However, since these are temporary vectors for each iteration of gradient descent, they have not been initialised to 0 vectors again for the subsequent iterations. That is, for iteration 2, the values that are modified into h(i)
and theta_temp(i)
are just added to the old values. Hence because of that, the code does not work properly. You need to update the vectors as zero vectors at the beginning of each iteration and then they would work correctly. Here is my implementation of your code (the first one, observe the changes) :
function [theta] = gradientDescent_i(X, y, theta, alpha, iterations)
% get the number of rows and columns
nrows = size(X, 1);
ncols = size(X, 2);
% run gradient descent for the specified number of iterations
count = 1;
while count <= iterations
% initialize the hypothesis vector
h = zeros(nrows, 1);
% initialize the temporary theta vector
theta_temp = zeros(ncols, 1);
% calculate the hypothesis values and fill into the vector
for i = 1 : nrows
for j = 1 : ncols
term = theta(j) * X(i, j);
h(i) = h(i) + term;
end
end
% calculate the gradient
for j = 1 : ncols
for i = 1 : nrows
term = (h(i) - y(i)) * X(i, j);
theta_temp(j) = theta_temp(j) + term;
end
end
% update the gradient with the factor
fact = alpha / nrows;
for i = 1 : ncols
theta_temp(i) = fact * theta_temp(i);
end
% update the theta
for i = 1 : ncols
theta(i) = theta(i) - theta_temp(i);
end
% update the count
count += 1;
end
end
I ran the code and it gave the same values of theta which you have mentioned. However, what I wonder is how did you state that the output of hypothesis vector was the same in both cases where clearly, this was one of the reasons for the first code failing!
Upvotes: 1