Dan Kyuso
Dan Kyuso

Reputation: 21

optimize.fmin_tnc is not giving right answer in scipy.optimize?

I am implementing Andrew ng's machine learning course in python. In programming exercise 2, on the first question, I am getting write answers for cost function and gradient but when calculation optimised theta, I am getting a disastrous answer!

I have already tried my best but not able to find the error

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def cost_compute(  theta,x, y):
    J = (-1/m) * np.sum(np.multiply(Y, np.log(sigmoid(X @ theta))) 
        + np.multiply((1-Y), np.log(1 - sigmoid(X @ theta))))
    return J

[m, n] = X.shape
X = np.hstack( (np.ones((m,1)) , X) )
Y = Y[:, np.newaxis]
theta = np.zeros((n+1,1))

def grad( theta, X, Y):
    temp = (1/m) * X.T @ (sigmoid(X @ theta) - Y)  
    return temp

temp = opt.fmin_tnc(func = cost_compute, x0 = theta.flatten() , fprime = grad , args = (X, Y.flatten()))

print(temp)

the expected cost is 0.693 and I am getting it. The expected grad is also exactly the same as actual answer. But the optimized theta i am getting is array([4.42735730e-05, 5.31690927e-03, 4.98646266e-03], giving me the new cost around 60! (instead of 0.203)

Upvotes: 1

Views: 483

Answers (2)

Dan Kyuso
Dan Kyuso

Reputation: 21

I did some tests by changing shapes of arrays, flattening it, reshaping it but nothing worked.

Since we were inputting a one dimension theta in fmin_tnc by flattening the theta, so i thought of changing the gradient function assuming that it will receive a single dimension theta instead of 3*1.

Earlier, it was

def grad( theta, X, Y):
    temp = (1/m) * X.T @ (sigmoid(X @ theta) - Y)  
    return temp

Now, it is

def grad( theta, X, Y):
    temp = (1/m) * (X.T @ (sigmoid(X @ theta[:,np.newaxis]) - Y))  
    return temp

Now it works!

Upvotes: 1

jdamp
jdamp

Reputation: 1460

The problem is that you are calling np.sum together with np.multiply instead of using e.g. np.dot, these operations are in general not equivalent.

The np.multiply operation calculates the elementwise product, while np.dot calculates the proper matrix product, see this answer on SO by Anuj Gautam:

np.dot is the dot product of two matrices.

|A B| . |E F| = |A*E+B*G A*F+B*H|
|C D|   |G H|   |C*E+D*G C*F+D*H|

Whereas np.multiply does an element-wise multiplication of two matrices.

|A B| ⊙ |E F| = |A*E B*F|
|C D|   |G H|   |C*G D*H|

To calculate the cross entropy loss, the matrix multiplication is needed.

Changing your cost function to

def cost_compute(  theta, X, Y):
    J = (-1/m) * (np.dot(Y.T, np.log(sigmoid(X @ theta))) 
        + np.dot((1-Y).T, np.log(1 - sigmoid(X @ theta))))
    return J

results in the desired result for me:

>> cost_compute(temp[0], X, Y)
array([0.2034977])

In addition, the case of your arguments x and y of the cost_compute function is wrong, as you use the capitalized versions X and Y inside the function.

Upvotes: 0

Related Questions