Reputation: 21
I am implementing Andrew ng's machine learning course in python. In programming exercise 2, on the first question, I am getting write answers for cost function and gradient but when calculation optimised theta, I am getting a disastrous answer!
I have already tried my best but not able to find the error
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def cost_compute( theta,x, y):
J = (-1/m) * np.sum(np.multiply(Y, np.log(sigmoid(X @ theta)))
+ np.multiply((1-Y), np.log(1 - sigmoid(X @ theta))))
return J
[m, n] = X.shape
X = np.hstack( (np.ones((m,1)) , X) )
Y = Y[:, np.newaxis]
theta = np.zeros((n+1,1))
def grad( theta, X, Y):
temp = (1/m) * X.T @ (sigmoid(X @ theta) - Y)
return temp
temp = opt.fmin_tnc(func = cost_compute, x0 = theta.flatten() , fprime = grad , args = (X, Y.flatten()))
print(temp)
the expected cost is 0.693 and I am getting it. The expected grad is also exactly the same as actual answer. But the optimized theta i am getting is array([4.42735730e-05, 5.31690927e-03, 4.98646266e-03], giving me the new cost around 60! (instead of 0.203)
Upvotes: 1
Views: 483
Reputation: 21
I did some tests by changing shapes of arrays, flattening it, reshaping it but nothing worked.
Since we were inputting a one dimension theta in fmin_tnc by flattening the theta, so i thought of changing the gradient function assuming that it will receive a single dimension theta instead of 3*1.
Earlier, it was
def grad( theta, X, Y):
temp = (1/m) * X.T @ (sigmoid(X @ theta) - Y)
return temp
Now, it is
def grad( theta, X, Y):
temp = (1/m) * (X.T @ (sigmoid(X @ theta[:,np.newaxis]) - Y))
return temp
Now it works!
Upvotes: 1
Reputation: 1460
The problem is that you are calling np.sum
together with np.multiply
instead of using e.g. np.dot
, these operations are in general not equivalent.
The np.multiply
operation calculates the elementwise product, while np.dot
calculates the proper matrix product, see this answer on SO by Anuj Gautam:
np.dot
is the dot product of two matrices.|A B| . |E F| = |A*E+B*G A*F+B*H| |C D| |G H| |C*E+D*G C*F+D*H|
Whereas
np.multiply
does an element-wise multiplication of two matrices.|A B| ⊙ |E F| = |A*E B*F| |C D| |G H| |C*G D*H|
To calculate the cross entropy loss, the matrix multiplication is needed.
Changing your cost function to
def cost_compute( theta, X, Y):
J = (-1/m) * (np.dot(Y.T, np.log(sigmoid(X @ theta)))
+ np.dot((1-Y).T, np.log(1 - sigmoid(X @ theta))))
return J
results in the desired result for me:
>> cost_compute(temp[0], X, Y)
array([0.2034977])
In addition, the case of your arguments x
and y
of the cost_compute
function is wrong, as you use the capitalized versions X
and Y
inside the function.
Upvotes: 0