Y.C.
Y.C.

Reputation: 169

Numpy Array operations returning NaN values; despite no NaN values in input

I am running a Linear Regression (using Gradience Descent Analysis / GDA) using imported data from a .csv file (data_axis and data are exported dates and stock market prices respectively.) The code below returns [nan nan nan nan nan nan] as the theta value. The square error also returns nan.

Error messages: 'overflow encountered in multiply', 'invalid value encountered in add'

import numpy as np

Xdata = np.array(data_axis)
Xdata = Xdata.reshape(-1,2)
print(Xdata.shape)

Ydata = np.array(data[0:783])
Ydata = Ydata[::-1]
print(Ydata.shape)

def phi(x):
  return np.array([1,x[0],x[1],x[0]*x[0],x[1]*x[0],x[1]*x[1]])

def gda(X, Y):
  n= len(X)
  theta= np.zeros(len(X[0]))
  alpha= 0.5
  iterations = 1000
  for j in range(iterations):
    for i in range(len(X)):
      theta += alpha*(Y[i] - np.dot(theta,X[i]))*X[i]/n
  return theta

def linear_regression_with_features(X, Y, phi):
  phi_X = np.array([ phi(x) for x in X])
  return gda(phi_X,Y)

theta = linear_regression_with_features(Xdata, Ydata, phi)

def h(theta, x):
  return np.dot(theta,phi(x))

error = np.array([Ydata[i]-h(theta,Xdata[i]) for i in range(len(Xdata))])
s_error = np.dot(error,error)
print('theta= ', theta)
print('square error= ', s_error)

plt.plot(Xdata,Ydata,'co')
plt.plot(h(theta,Xdata),'r-') 

The code does successfully return and plot a linear regression for randomly generated inputs Xdata = np.random.rand(783,2),Ydata = np.array([ 2-4*x[0]+3*x[1]+x[0]*x[0]+2*x[0]*x[1]-3*x[1]*x[1] for x in X]).

I checked that there are no NaN values in the .csv file. I searched for the error message, and read that some Python & NumPy operations involving extremely small or extremely large numbers may output a NaN value as the result. Could this be my issue? Or is there something else which may be fixed?

Upvotes: 0

Views: 1292

Answers (1)

Y.C.
Y.C.

Reputation: 169

I found the problem - it was a combination of very small values, very large values, and input matrices being the wrong shape. Here, X-axis input and Y-axis input had to be shapes (n,2) and (n,) respectively

Rounding did not help, but rearranging the data from scratch did.

Upvotes: 1

Related Questions