Reputation: 103
I am trying to use fmin_ncg for minimizing my cost function. But, the results that I get back are not minimized. I get the same result I would get without advanced optimization. I know for a fact that it can further be minimized.
PS. I am trying to code assignment 2 of the Coursera's ML course.
My cost fn:
def costFn(theta, X, y, m, lam):
h = sigmoid(X.dot(theta))
theta0 = theta
J = 1 / m * np.sum((-(y * np.log(h))) - ((1-y) * np.log(1-h))) + (lam/(2*m) * theta0.T.dot(theta0))
return J.flatten()
X would look something like this:
[[ 1.00000000e+00 5.12670000e-02 6.99560000e-01 ..., 6.29470940e-04
8.58939846e-03 1.17205992e-01]
[ 1.00000000e+00 -9.27420000e-02 6.84940000e-01 ..., 1.89305413e-03
-1.39810280e-02 1.03255971e-01]
[ 1.00000000e+00 -2.13710000e-01 6.92250000e-01 ..., 1.04882142e-02
-3.39734512e-02 1.10046893e-01]
...,
[ 1.00000000e+00 -4.84450000e-01 9.99270000e-01 ..., 2.34007252e-01
-4.82684337e-01 9.95627986e-01]
....
Y is a bunch of 0s and 1s
[[1]
[1]
[1]
[1]
...
[0]
[0]]
X.shape = (118, 28)
y.shape = (118, 1)
My grad function:
def grad(theta, X, y, m, lam):
h = sigmoid(X.dot(theta))
theta0 = initial_theta
gg = 1.0 / m * ((X.T.dot(h-y)) + (lam * theta0))
return gg.flatten()
Using just my costFn and grad, I get the following:
Cost at initial theta (zeros): 0.69314718056
With fmin_ncg:
xopt = fmin_ncg(costFn, fprime=grad, x0=initial_theta, args=(X, y, m, lam), maxiter=400, disp=True, full_output=True )
I get:
Optimization terminated successfully.
Current function value: 0.693147
Iterations: 1
Function evaluations: 2
Gradient evaluations: 4
Hessian evaluations: 0
Using octave, my J after advanced optimization should be:
0.52900
What am I doing wrong?
EDIT: I got my optimization to work:
y1 = y.flatten()
Result = op.minimize(fun = costFn,
x0 = initial_theta,
args = (X, y1, m, lam),
method = 'CG',
options={'disp': True})
I get the costFn to be 0.52900, which is what I expected.
But the values of 'theta' are a bit off that the accuracy is only 42%. It's supposed to be 83%.
The values of theta I got:
[ 1.14227089 0.60130664 1.16707559 -1.87187892 -0.91534354 -1.26956697
0.12663015 -0.36875537 -0.34522652 -0.17363325 -1.42401493 -0.04872243
-0.60650726 -0.269242 -1.1631064 -0.24319088 -0.20711764 -0.04333854
-0.28026111 -0.28693582 -0.46918892 -1.03640373 0.02909611 -0.29266766
0.01725324 -0.32899144 -0.13795701 -0.93215664]
The actual values:
[1.273005 0.624876 1.177376 -2.020142 -0.912616 -1.429907 0.125668 -0.368551
-0.360033 -0.171068 -1.460894 -0.052499 -0.618889 -0.273745 -1.192301
-0.240993 -0.207934 -0.047224 -0.278327 -0.296602 -0.453957 -1.045511
0.026463 -0.294330 0.014381 -0.328703 -0.143796 -0.924883]
Upvotes: 0
Views: 188
Reputation: 66835
First of all your gradient is invalid
def grad(theta, X, y, m, lam):
h = sigmoid(X.dot(initial_theta))
theta0 = initial_theta
gg = 1 / m * ((X.T.dot(h-y)) + (lam * theta0))
return gg.flatten()
this function never uses theta, you put initial_theta
instead, which is incorrect.
Similar error in the cost
def costFn(theta, X, y, m, lam):
h = sigmoid(X.dot(initial_theta))
theta0 = theta
J = 1 / m * np.sum((-(y * np.log(h))) - ((1-y) * np.log(1-h))) + (lam/(2*m) * theta0.T.dot(theta0))
return J.flatten()
you have some odd mix of theta
and initial_theta
, which also does not make sense, there should be only theta
inside. As a side note - there should be no need for flattening, your cost function should be a scalar, thus if you have to flatten - something is wrong in your computations.
Also worth checking - what is your m
? If it is an integer, and you are using python 2.X, then 1 / m
equals zero, since it is integer division. You should do 1.0 / m
instead. (in both functions)
Upvotes: 1