Reputation: 1
for iter = 1:num_iters
theta = theta - (alpha / m) * X' * (X * theta - y);
J_history(iter) = computeCostMulti(X, y, theta);
end
function J = computeCostMulti(X, y, theta)
m = length(y);
J = 0;
J = 1 / (2 * m) * (X * theta - y)' * (X * theta - y);
theta = pinv(X' * X) * X' * y;
These two implementations converge to different values of theta for the same values of X and y. The Normal Equation gives the right answer but Gradient descent gives a wrong answer.
Is there anything wrong with the implementation of Gradient Descent?
Upvotes: 0
Views: 3082
Reputation: 1
If you have normalized the training data before gradient descent, you should also do it with your input data for the prediction. Concretely, your new input data should be like:
[1, (x-mu)/sigma]
where:
- 1
is the bias term
- mu
is the mean of the training data
- sigma
is the standard deviation of the training data
Upvotes: 0
Reputation: 21
It doesn't matter. As you're not making feature scaling to use the normal equation, you'll discover that the prediction is the same
Upvotes: 2
Reputation: 11424
Nobody promised you that gradient descent with fixed step size will converge by num_iters
iterations even to a local optimum. You need to iterate until some well defined convergency criteria are met (e.g. gradient is close to zero).
Upvotes: 0
Reputation: 4558
I suppose that when you use gradient descent, you first process your input using feature scaling. That is not done with the normal equation method (as feature scaling is not required), and that should result in a different theta. If you use your models to make predictions they should come up with the same result.
Upvotes: 3