Reputation: 108
I actually want to use a linear model to fit a set of 'sin' data, but it turns out the loss function goes larger during each iteration. Is there any problem with my code below ? (gradient descent method)
Here is my code in Matlab
m=20;
rate = 0.1;
x = linspace(0,2*pi,20);
x = [ones(1,length(x));x]
y = sin(x);
w = rand(1,2);
for i=1:500
h = w*x;
loss = sum((h-y).^2)/m/2
total_loss = [total_loss loss];
**gradient = (h-y)*x'./m ;**
w = w - rate.*gradient;
end
Here is the data I want to fit
Upvotes: 1
Views: 970
Reputation: 104503
There isn't a problem with your code. With your current framework, if you can define data in the form of y = m*x + b
, then this code is more than adequate. I actually ran it through a few tests where I define an equation of the line and add some Gaussian random noise to it (amplitude = 0.1, mean = 0, std. dev = 1).
However, one problem I will mention to you is that if you take a look at your sinusoidal data, you define a domain between [0,2*pi]
. As you can see, you have multiple x
values that get mapped to the same y
value but of different magnitude. For example, at x = pi/2
we get 1 but at x = -3*pi/2
we get -1. This high variability will not bode well with linear regression, and so one suggestion I have is to restrict your domain... so something like [0, pi]
. Another reason why it probably doesn't converge is the learning rate you chose is too high. I'd set it to something low like 0.01
. As you mentioned in your comments, you already figured that out!
However, if you want to fit non-linear data using linear regression, you're going to have to include higher order terms to account for the variability. As such, try including second order and/or third order terms. This can simply be done by modifying your x
matrix like so:
x = [ones(1,length(x)); x; x.^2; x.^3];
If you recall, the hypothesis function can be represented as a summation of linear terms:
h(x) = theta0 + theta1*x1 + theta2*x2 + ... + thetan*xn
In our case, each theta
term would build a higher order term of our polynomial. x2
would be x^2
and x3
would be x^3
. Therefore, we can still use the definition of gradient descent for linear regression here.
I'm also going to control the random generation seed (via rng
) so that you can produce the same results I have gotten:
clear all;
close all;
rng(123123);
total_loss = [];
m = 20;
x = linspace(0,pi,m); %// Change
y = sin(x);
w = rand(1,4); %// Change
rate = 0.01; %// Change
x = [ones(1,length(x)); x; x.^2; x.^3]; %// Change - Second and third order terms
for i=1:500
h = w*x;
loss = sum((h-y).^2)/m/2;
total_loss = [total_loss loss];
% gradient is now in a different expression
gradient = (h-y)*x'./m ; % sum all in each iteration, it's a batch gradient
w = w - rate.*gradient;
end
If we try this, we get for w
(your parameters):
>> format long g;
>> w
w =
Columns 1 through 3
0.128369521905694 0.819533906064327 -0.0944622478526915
Column 4
-0.0596638117151464
My final loss after this point is:
loss =
0.00154350916582836
This means that our equation of the line is:
y = 0.12 + 0.819x - 0.094x^2 - 0.059x^3
If we plot this equation of the line with your sinusoidal data, this is what we get:
xval = x(2,:);
plot(xval, y, xval, polyval(fliplr(w), xval))
legend('Original', 'Fitted');
Upvotes: 1