regression line does't fit the data and gradient descent gives inaccurate weights - python3

Question

I've implemented Linear regression and gradient descent from scratch and it gives me weird results like negative numbers which are really small.

Sample data

   609.0,241.0
   629.0,222.0
   620.0,233.0
   564.0,207.0
   645.0,247.0
   493.0,189.0
   606.0,226.0
   672.0,231.0
   778.0,263.0

Gray Kangaroos dataset

Sample Data can be found at : http://college.cengage.com/mathematics/brase/understandable_statistics/7e/students/datasets/slr/frames/frame.html

import numpy as np 
import matplotlib.pyplot as plt

# loading data from a csv file
x_dataset = np.array(data[0],dtype = np.float64)
y_dataset = np.array(data[1],dtype = np.float64)
m = len(y_dataset)
theta  = np.array([ 0 for z in range(len(x_dataset))],dtype = np.float64)
theta[0] = 0.5
theta[1] = 0.3

def hypothesis(x,theta_hyp):
    hyp =  np.dot(theta_hyp.T,x)
    return hyp

def gradient(theta,x,y,numIter = 30,alpha = 0.00000001):
    for i in range(numIter):
        loss = y - hypothesis(x,theta)
        error = np.sum(loss**2)/2*m
        print("Cost : {0} at {1} itertion".format(error,i))
        # just to plot the cost function 
        #cost_list.append(error)
        #iter_list.append(i)
        gradientD = np.dot(x.T,loss)
       # here if I subtract it gives me negative results
       theta = theta - alpha*gradientD 
   return theta

After playing with the problem I figured out that if theta is negative, the cost function increases. And if theta is positive, cost function decreases. I wanted the cost function to decrease so I changed the code a bit which gave me a positive theta and decreasing cost function.

  # adding gives +ve theta
  theta = theta + alpha*gradientD

I plotted the graph of the cost function

Cost function J(theta): x axis - iterations | y axis - cost

After training it gives me some weights. When I use the weights to predict y it doesn't predict a good value. When I plot the regression line on the graph, it doesn't fit the data at all.

regression line plotted on the data

I'm still learning about this stuff and I'm not sure if my implementation is right. Also, my learning rate is really small. I've seen learning rates no smaller than 0.001. I used 0.001 as the learning rate but it gives an inaccurate cost function.

I'm not sure if I've been explicit, but I'd really appreciate your help.

regression line does't fit the data and gradient descent gives inaccurate weights - python3

Answers (1)

Related Questions

regression line does&#39;t fit the data and gradient descent gives inaccurate weights - python3

Answers (1)

Related Questions

regression line does't fit the data and gradient descent gives inaccurate weights - python3