Reputation: 71
I'm trying to implement the linear regression with a single variable for linear regression (exercise 1 from standford's course on coursera about machine learning).
My understanding is that this is the math :
Now, my code implementation would be like this:
for iter = 1:num_iters
temp1 = theta(1) - alpha * sum(X * theta - y) / m;
temp2 = theta(2) - alpha * sum( (X * theta - y) .* X(2) ) / m;
theta(1) = temp1;
theta(2) = temp2;
where
I tried doing this manually with a small example (m = 4), and I think my code is right... but it obviously isn't, or I won't be writing here. When I run the algorithm, I get a different theta in return depending of the initial theta I pass to the function, and if I plot the cost function it obviously isn't right for certain values of theta (not all):
That probably means I don't really understand the math (and that would explain why everyone else on stackoverflow is using 'transpose' and I don't), the problem being I don't know which is the part I'm having trouble with.
I'd really appreciate some insights, but I'd like to complete the exercise on my own. Basically I'm looking for help, but not for the complete solution
EDIT: Apparently it was not a logical error, but a semantic error. When assigning temp2, I should have wrote (X * theta - y) .* X(:,2)
instead of (X * theta - y) .* X(2)
; Basically, I was not selecting the second column of X (which is a mX2 matrix), but a scalar (due to octave's syntax).
Upvotes: 2
Views: 7280
Reputation: 321
According to the gradient descent algorithm you have to update the value of theta(1)
and theta(2)
simultaneously. You cannot update the value of theta(1)
first, and then calculate the value of theta(2)
using updated theta(1)
value.
Check this code for better understanding:
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
x = X(:,2);
h = theta(1) + (theta(2)*x);
theta_zero = theta(1) - alpha * (1/m) * sum(h-y);
theta_one = theta(2) - alpha * (1/m) * sum((h - y) .* x);
theta = [theta_zero; theta_one];
J_history(iter) = computeCost(X, y, theta);
end
Here I have updated the value of theta(1)
and theta(2)
simultaneously. The definition of Gradient Descent Algorithm is Repeat until convergence and update the value of theta simultaneously.
Upvotes: 1
Reputation: 11
Please try this (Linear Regression with one variable):
m = length(y);
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
% Normal Equation
% theta = pinv(X'*X)*X'*y;
predictions = X * theta;
delta = (1/m) * X' * (predictions - y);
theta = theta - alpha * delta;
% Save the cost J in every iteration
J_history(iter) = computeCost(X, y, theta);
end
Upvotes: 1
Reputation: 4652
I just looked at the course briefly and it looks like you are mostly on the right track but here are some helpful hints:
Here is a starting place that you can work from:
for iter = 1:num_iters
theta(1) = theta(1) - alpha * sum( (theta(1)+theta(2).*X) - y) / m;
theta(2) = theta(2) - alpha * sum( ((theta(1)+theta(2).*X) - y) .* X ) / m;
Upvotes: 3