Chetan Grandhe
Chetan Grandhe

Reputation: 73

function returning nan in for loop when performing multivariate linear regression

I am performing multivariate linear regression in pure python as seen from the code below.Can someone please tell me what's wrong in his code? I have done the same for univariate linear regression. It performed well in it!

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

x_df=pd.DataFrame([[2.0,70.0],[3.0,30.0],[4.0,80.0],[4.0,20.0],[3.0,50.0],[7.0,10.0],[5.0,50,0],[3.0,90.0],[2.0,20.0]])
y_df=pd.DataFrame([79.4,41.5,97.5,36.1,63.2,39.5,69.8,103.5,29.5])
x_df=x_df.drop(x_df.columns[2:], axis=1)

#print(x_df)

m=len(y_df)
#print(m)

x_df['intercept']=1
X=np.array(x_df)
#print(X)
#print(X.shape)
y=np.array(y_df).flatten()
#print(y.shape)
theta=np.array([0,0,0])
#print(theta)

def hypothesis(x,theta):
    return np.dot(x,theta)

#print(hypothesis(X,theta))

def cost(x,y,theta):
    m=y.shape[0]
    h=np.dot(x,theta)
    return np.sum(np.square(y-h))/(2.0*m)

#print(cost(X,y,theta))

def gradientDescent(x,y,theta,alpha=0.01,iter=1500):
    m=y.shape[0]

    for i in range(1500):
        h=hypothesis(x,theta)
        error=h-y
        update=np.dot(error,x)
        theta=np.subtract(theta,((alpha*update)/m))


    print('theta',theta)
    print('hyp',h)
    print('y',y)
    print('error',error)
    print('cost',cost(x,y,theta))

print(gradientDescent(X,y,theta))

and the output I get is :-

theta [ nan  nan  nan]
hyp [ nan  nan  nan  nan  nan  nan  nan  nan  nan]
y [  79.4   41.5   97.5   36.1   63.2   39.5   69.8  103.5   29.5]
error [ nan  nan  nan  nan  nan  nan  nan  nan  nan]
cost nan

Can someone please help me in solving this? i have been struck like almost 5 hours trying!

Upvotes: 0

Views: 144

Answers (1)

Yang
Yang

Reputation: 311

Your learning rate is too large to converge, try alpha=0.00001.

Upvotes: 1

Related Questions