Reputation: 5788
I'm using the next cost()
and gradient()
regularized functions:
def cost(theta, x, y, lam):
theta = theta.reshape(1, len(theta))
predictions = sigmoid(np.dot(x, np.transpose(theta))).reshape(len(x), 1)
regularization = (lam / (len(x) * 2)) * np.sum(np.square(np.delete(theta, 0, 1)))
complete = -1 * np.dot(np.transpose(y), np.log(predictions)) \
- np.dot(np.transpose(1 - y), np.log(1 - predictions))
return np.sum(complete) / len(x) + regularization
def gradient(theta, x, y, lam):
theta = theta.reshape(1, len(theta))
predictions = sigmoid(np.dot(x, np.transpose(theta))).reshape(len(x), 1)
theta_without_intercept = theta.copy()
theta_without_intercept[0, 0] = 0
assert(theta_without_intercept.shape == theta.shape)
regularization = (lam / len(x)) * np.sum(theta_without_intercept)
return np.sum(np.multiply((predictions - y), x), 0) / len(x) + regularization
With these functions and scipy.optimize.fmin_bfgs()
I'm getting next output ( which is almost correct ):
Starting loss value: 0.69314718056
Warning: Desired error not necessarily achieved due to precision loss.
Current function value: 0.208444
Iterations: 8
Function evaluations: 51
Gradient evaluations: 39
7.53668131651e-08
Trained loss value: 0.208443907192
Formula for Reguarization below. If I comment regularized inputs above scipy.optimize.fmin_bfgs()
works well, and returns local optimum correctly.
Any ideas why?
UPDATE:
After additional comments , I updated cost and gradient regularization (in the code above). But this warning still appear (new outputs above). scipy check_grad
function return next value: 7.53668131651e-08.
UPDATE 2:
I'm using set UCI Machine Learning Iris
data. And based on Classification model One-vs-All
training first resuls for Iris-setosa
.
Upvotes: 1
Views: 215
Reputation: 5788
Issue was in my calculus, where for some reason I sum theta
values in regularization: regularization = (lam / len(x)) * np.sum(theta_without_intercept)
. We don't need np.sum regularized value in this stage. That will produce avaregae regularization for each theta and next prediction loss. Thanks for the help, anyway.
Gradient method:
def gradient(theta, x, y, lam):
theta_len = len(theta)
theta = theta.reshape(1, theta_len)
predictions = sigmoid(np.dot(x, np.transpose(theta))).reshape(len(x), 1)
theta_wo_bias = theta.copy()
theta_wo_bias[0, 0] = 0
assert (theta_wo_bias.shape == theta.shape)
regularization = np.squeeze(((lam / len(x)) *
theta_wo_bias).reshape(theta_len, 1))
return np.sum(np.multiply((predictions - y), x), 0) / len(x) + regularization
Output:
Starting loss value: 0.69314718056
Optimization terminated successfully.
Current function value: 0.201681
Iterations: 30
Function evaluations: 32
Gradient evaluations: 32
7.53668131651e-08
Trained loss value: 0.201680992316
Upvotes: 0
Reputation: 488
As you are trying to perform an L2-regularization, then you should modify the value in your cost function from
regularization = (lam / len(x) * 2) * np.sum(np.square(np.delete(theta, 0, 1)))
to
regularization = (lam / (len(x) * 2)) * np.sum(np.square(np.delete(theta, 0, 1)))
Also, the gradient part of the regularization should have the same shape as the vector of parameters theta
. Hence I rather think the correct value would be
theta_without_intercept = theta.copy()
theta_without_intercept[0] = 0 # You are not penalizing the intercept in your cost function, i.e. theta_0
assert(theta_without_intercept.shape == theta.shape)
regularization = (lam / len(x)) * theta_without_intercept
Otherwise, the gradient won't be correct. You can then check that your gradient is correct by using scipy.optimize.check_grad()
function.
Upvotes: 2