Gradient Descent failing for multiple variables, results in NaN

Question

I am trying to implement gradient descent algorithm to minimize a cost function for multiple linear algorithm. I am using the concepts explained in the machine learning class by Andrew Ng. I am using Octave. However when I try to execute the code it seems to fail to provide the solution as my theta values computes to "NaN". I have attached the cost function code and the gradient descent code. Can someone please help.

Cost function :

function J = computeCostMulti(X, y, theta)

m = length(y); % number of training examples

J = 0;

h=(X*theta);
s= sum((h-y).^2);
J= s/(2*m);

Gradient Descent Code:

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)

m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

for iter = 1:num_iters

  a= X*theta -y;
  b = alpha*(X'*a);
  theta = theta - (b/m);

  J_history(iter) = computeCostMulti(X, y, theta);  
end

Gaurav Chordiya · Accepted Answer

I did found the bug and it was not either in the logic of the cost function or gradient descent function. But indeed in the feature normilization logic and I was accidentally returning the wrong varible and hence it was cauing the output to be "NaN"

It is dumb mistake :

What I was doing previously

mu= mean(a);
sigma = std(a);
b=(X.-mu);
X= b./sigma;

Instead what I shoul be doing

function [X_norm, mu, sigma] = featureNormalize(X)
%FEATURENORMALIZE Normalizes the features in X 
%   FEATURENORMALIZE(X) returns a normalized version of X where
%   the mean value of each feature is 0 and the standard deviation
%   is 1. This is often a good preprocessing step to do when
%   working with learning algorithms.

% You need to set these values correctly
X_norm = X;
mu = zeros(1, size(X, 2));
sigma = zeros(1, size(X, 2));

% ====================== YOUR CODE HERE ======================


mu= mean(X);
sigma = std(X);
a=(X.-mu);
X_norm= a./sigma;

% ============================================================

end

So clearly I should be using X_norm insated of X and that is what cauing the code to give wrong output

Gradient Descent failing for multiple variables, results in NaN

Answers (2)

Related Questions