Reputation: 226
I've implemented Linear regression and gradient descent from scratch and it gives me weird results like negative numbers which are really small.
Sample data
609.0,241.0
629.0,222.0
620.0,233.0
564.0,207.0
645.0,247.0
493.0,189.0
606.0,226.0
672.0,231.0
778.0,263.0
Gray Kangaroos dataset
Sample Data can be found at : http://college.cengage.com/mathematics/brase/understandable_statistics/7e/students/datasets/slr/frames/frame.html
import numpy as np
import matplotlib.pyplot as plt
# loading data from a csv file
x_dataset = np.array(data[0],dtype = np.float64)
y_dataset = np.array(data[1],dtype = np.float64)
m = len(y_dataset)
theta = np.array([ 0 for z in range(len(x_dataset))],dtype = np.float64)
theta[0] = 0.5
theta[1] = 0.3
def hypothesis(x,theta_hyp):
hyp = np.dot(theta_hyp.T,x)
return hyp
def gradient(theta,x,y,numIter = 30,alpha = 0.00000001):
for i in range(numIter):
loss = y - hypothesis(x,theta)
error = np.sum(loss**2)/2*m
print("Cost : {0} at {1} itertion".format(error,i))
# just to plot the cost function
#cost_list.append(error)
#iter_list.append(i)
gradientD = np.dot(x.T,loss)
# here if I subtract it gives me negative results
theta = theta - alpha*gradientD
return theta
After playing with the problem I figured out that if theta is negative, the cost function increases. And if theta is positive, cost function decreases. I wanted the cost function to decrease so I changed the code a bit which gave me a positive theta and decreasing cost function.
# adding gives +ve theta
theta = theta + alpha*gradientD
I plotted the graph of the cost function
After training it gives me some weights. When I use the weights to predict y
it doesn't predict a good value. When I plot the regression line on the graph, it doesn't fit the data at all.
I'm still learning about this stuff and I'm not sure if my implementation is right. Also, my learning rate is really small. I've seen learning rates no smaller than 0.001. I used 0.001 as the learning rate but it gives an inaccurate cost function.
I'm not sure if I've been explicit, but I'd really appreciate your help.
Upvotes: 0
Views: 264
Reputation: 866
You've got error and loss defined backwards... the error is the difference between the prediction and the data, and the loss function maps that error onto an objective for a fit routine. Your gradient calc is roughly correct (although it's not scaled to how you've defined the loss function, and the "loss" term in the gradient calc is actually the error).
However, your value of alpha (step size) is extremely small, which will impact speed of convergence. Because you only allow 30 iterations, it may not converge (it clearly starts in a really bad place with loss=6e7 - it's not clear from the scale of the graph how close to zero it gets by the 30th iteration). Try upping the alpha value to see if it gets closer to its final value in the 30 iterations allowed (based on the end-state loss value). Right now your graph of loss vs. iteration is getting swamped by the very high value of initial-state loss (plotting the log of the loss or log10 of the loss may make it easier to compare across experiments).
Upvotes: 0