fluency03
fluency03

Reputation: 2697

fmin_cg: Desired error not necessarily achieved due to precision loss

I have the following code to minimize the Cost Function with its gradient.

def trainLinearReg( X, y, lamda ):
    # theta = zeros( shape(X)[1], 1 )
    theta = random.rand( shape(X)[1], 1 ) # random initialization of theta

    result = scipy.optimize.fmin_cg( computeCost, fprime = computeGradient, x0 = theta, 
                                     args = (X, y, lamda), maxiter = 200, disp = True, full_output = True )
    return result[1], result[0]

But I am having this warning:

Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 8403387632289934651424768.000000
         Iterations: 0
         Function evaluations: 15
         Gradient evaluations: 3

My computeCost and computeGradient are defined as

def computeCost( theta, X, y, lamda ):
    theta = theta.reshape( shape(X)[1], 1 )
    m     = shape(y)[0]
    J     = 0
    grad  = zeros( shape(theta) )

    h = X.dot(theta)
    squaredErrors = (h - y).T.dot(h - y)
    # theta[0] = 0.0
    J = (1.0 / (2 * m)) * (squaredErrors) + (lamda / (2 * m)) * (theta.T.dot(theta))

    return J[0]

def computeGradient( theta, X, y, lamda ):
    theta = theta.reshape( shape(X)[1], 1 )
    m     = shape(y)[0]
    J     = 0
    grad  = zeros( shape(theta) )

    h = X.dot(theta)
    squaredErrors = (h - y).T.dot(h - y)
    # theta[0] = 0.0
    J = (1.0 / (2 * m)) * (squaredErrors) + (lamda / (2 * m)) * (theta.T.dot(theta))
    grad = (1.0 / m) * (X.T.dot(h - y)) + (lamda / m) * theta

    return grad.flatten()

I have reviewed these similar questions:

scipy.optimize.fmin_bfgs: “Desired error not necessarily achieved due to precision loss”

scipy.optimize.fmin_cg: "'Desired error not necessarily achieved due to precision loss.'

scipy is not optimizing and returns "Desired error not necessarily achieved due to precision loss"

But still cannot have the solution to my problem. How to let the minimization function process converge instead of being stuck at first?


ANSWER:

I solve this problem based on @lejlot 's comments below. He is right. The data set X is to large since I did not properly return the correct normalized value to the correct variable. Even though this is a small mistake, it indeed can give you the thought where should we look at when encountering such problems. The Cost Function value is too large leads to the possibility that there are some wrong with my data set.

The previous wrong one:

X_poly            = polyFeatures(X, p)
X_norm, mu, sigma = featureNormalize(X_poly)
X_poly            = c_[ones((m, 1)), X_poly]

The correct one:

X_poly            = polyFeatures(X, p)
X_poly, mu, sigma = featureNormalize(X_poly)
X_poly            = c_[ones((m, 1)), X_poly]

where X_poly is actually used in the following traing as

cost, theta = trainLinearReg(X_poly, y, lamda)

Upvotes: 2

Views: 9008

Answers (4)

Vicrobot
Vicrobot

Reputation: 3988

I today faced this problem.

I then noticed that my cost function was implemented wrong way and was producing high scaled errors due to which scipy was asking for more data. Hope this helps for someone like me.

Upvotes: 1

Abhishek Srivastava
Abhishek Srivastava

Reputation: 561

I too faced this problem and even after searching a lot for solutions nothing happened as the solutions were not clearly defined.

Then I read the documentation from scipy.optimize.fmin_cg where it is clearly mentioned that parameter x0 must be a 1-D array.

My approach was same as that of you wherein I passed 2-D matrix as x0 and I always got some precision error or divide by zero error and same warning as you got.

Then I changed my approach and passed theta as a 1-D array and converted that array into 2-D matrix inside the computeCost and computeGradient function which worked for me and I got the results as expected.

My solutiion for Logistic Regression

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

theta = np.zeros(features)

def computeCost(theta,X, Y):
    x = np.matrix(X.values)
    y = np.matrix(Y.values)
    theta = np.matrix(theta)
    xtheta = np.matmul(x,theta.T)
    hx = sigmoid(xtheta)
    cost = (np.multiply(y,np.log(hx)))+(np.multiply((1-y),np.log(1-hx)))
    return -(np.sum(cost))/m

    def computeGradient(theta, X, Y):
    x = np.matrix(X.values)
    y = np.matrix(Y.values)
    theta = np.matrix(theta)
    grad = np.zeros(features)
    xtheta = np.matmul(x,theta.T)
    hx = sigmoid(xtheta)
    error = hx-Y
    for i in range(0,features,1):
        term = np.multiply(error,x[:,i])
        grad[i] = (np.sum(term))/m
    return grad

import scipy.optimize as opt  
result = opt.fmin_tnc(func=computeCost, x0=theta, fprime=computeGradient, args=(X, Y)) 

print cost(result[0],X, Y)

Note Again that theta has to be a 1-D array

So in your code modify theta in trainLinearReg to theta = random.randn(features)

Upvotes: 1

Elmira Birang
Elmira Birang

Reputation: 71

For my implementation scipy.optimize.fmin_cg also failed with the above-mentioned error in some initial guesses. Then I changed it to the BFGS method and converged.

 scipy.optimize.minimize(fun, x0, args=(), method='BFGS', jac=None, tol=None, callback=None, options={'disp': False, 'gtol': 1e-05, 'eps': 1.4901161193847656e-08, 'return_all': False, 'maxiter': None, 'norm': inf})

seems that this error in cg is inevitable still as, CG ends up with a non-descent direction

Upvotes: 1

fluency03
fluency03

Reputation: 2697

ANSWER:

I solve this problem based on @lejlot 's comments below. He is right. The data set X is to large since I did not properly return the correct normalized value to the correct variable. Even though this is a small mistake, it indeed can give you the thought where should we look at when encountering such problems. The Cost Function value is too large leads to the possibility that there are some wrong with my data set.

The previous wrong one:

X_poly            = polyFeatures(X, p)
X_norm, mu, sigma = featureNormalize(X_poly)
X_poly            = c_[ones((m, 1)), X_poly]

The correct one:

X_poly            = polyFeatures(X, p)
X_poly, mu, sigma = featureNormalize(X_poly)
X_poly            = c_[ones((m, 1)), X_poly]

where X_poly is actually used in the following traing as

cost, theta = trainLinearReg(X_poly, y, lamda)

Upvotes: 1

Related Questions