Reputation: 45
I am trying to implement gradient descent algorithm to minimize a cost function for multiple linear algorithm. I am using the concepts explained in the machine learning class by Andrew Ng. I am using Octave. However when I try to execute the code it seems to fail to provide the solution as my theta values computes to "NaN". I have attached the cost function code and the gradient descent code. Can someone please help.
Cost function :
function J = computeCostMulti(X, y, theta)
m = length(y); % number of training examples
J = 0;
h=(X*theta);
s= sum((h-y).^2);
J= s/(2*m);
Gradient Descent Code:
function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
a= X*theta -y;
b = alpha*(X'*a);
theta = theta - (b/m);
J_history(iter) = computeCostMulti(X, y, theta);
end
Upvotes: 1
Views: 2193
Reputation: 105
I implemented this algorithm in GNU Octave and I separated this into 2 different functions, first you need to define a gradient function
function [thetaNew] = compute_gradient (X, y, theta, m)
thetaNew = (X'*(X*theta'-y))*1/m;
end
then to compute the gradient descent algorithm use a different function
function [theta] = gd (X, y, alpha, num_iters)
theta = zeros(1,columns(X));
for iter = 1:num_iters,
theta = theta - alpha*compute_gradient(X,y,theta,rows(y))';
end
end
Edit 1 This algorithm works for both multiple linear regression (multiple independent variable) and linear regression of 1 independent variable, I tested this with this dataset
age height weight
41 62 115
21 62 140
31 62 125
21 64 125
31 64 145
41 64 135
41 72 165
31 72 190
21 72 175
31 66 150
31 66 155
21 64 140
For this example we want to predict
predicted weight = theta0 + theta1*age + theta2*height
I used these input values for alpha and num_iters
alpha=0.00037
num_iters=3000000
The output of runing gradient descent for this experiment is as follows:
theta =
-170.10392 -0.40601 4.99799
So the equation is
predicted weight = -170.10392 - .406*age + 4.997*height
This is almost absolute minimum of the gradient, since the true results for this problem if using PSPP (open source alternative of SPSS) are
predicted weight = -175.17 - .40*age + 5.07*height
Hope this helps to confirm the gradient descent algorithm works same for multiple linear regression and standard linear regression
Upvotes: 2
Reputation: 45
I did found the bug and it was not either in the logic of the cost function or gradient descent function. But indeed in the feature normilization logic and I was accidentally returning the wrong varible and hence it was cauing the output to be "NaN"
It is dumb mistake :
What I was doing previously
mu= mean(a);
sigma = std(a);
b=(X.-mu);
X= b./sigma;
Instead what I shoul be doing
function [X_norm, mu, sigma] = featureNormalize(X)
%FEATURENORMALIZE Normalizes the features in X
% FEATURENORMALIZE(X) returns a normalized version of X where
% the mean value of each feature is 0 and the standard deviation
% is 1. This is often a good preprocessing step to do when
% working with learning algorithms.
% You need to set these values correctly
X_norm = X;
mu = zeros(1, size(X, 2));
sigma = zeros(1, size(X, 2));
% ====================== YOUR CODE HERE ======================
mu= mean(X);
sigma = std(X);
a=(X.-mu);
X_norm= a./sigma;
% ============================================================
end
So clearly I should be using X_norm insated of X and that is what cauing the code to give wrong output
Upvotes: 1