Reputation: 217
I'm using the L-BFGS-B optimizer to find the minima of a function. This will help me calculate sharpness for the function. However, I'm not sure if this following message is considered a normal message (i.e. Is there something wrong with my program or is this message typical?) See below:
RUNNING THE L-BFGS-B CODE
* * *
Machine precision = 2.220D-16
N = 28149514 M = 10
At X0 0 variables are exactly at the bounds
^[[C
At iterate 0 f= -3.59325D+00 |proj g|= 2.10249D-03
At iterate 1 f= -2.47853D+01 |proj g|= 4.20499D-03
Bad direction in the line search;
refresh the lbfgs memory and restart the iteration.
At iterate 2 f= -2.53202D+01 |proj g|= 4.17686D-03
At iterate 3 f= -2.53202D+01 |proj g|= 4.17686D-03
* * *
Tit = total number of iterations
Tnf = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip = number of BFGS updates skipped
Nact = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F = final function value
* * *
N Tit Tnf Tnint Skip Nact Projg F
***** 3 43 ****** 0 ***** 4.177D-03 -2.532D+01
F = -25.320247650146484
CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
Warning: more than 10 function and gradient
evaluations in the last line search. Termination
may possibly be caused by a bad search direction.
I got the following sharpness anyway which is relatively consistent with the paper I'm trying to reproduce: It's just that I'm a bit concerned with the above message. tensor(473.0201)
Here is my code for computing sharpness:
def get_sharpness(data_loader, model, criterion, epsilon, manifolds=0):
# extract current x0
x0 = None
for p in model.parameters():
if x0 is None:
x0 = p.data.view(-1)
else:
x0 = torch.cat((x0, p.data.view(-1)))
x0 = x0.cpu().numpy()
# get current f_x
f_x0, _ = get_minus_cross_entropy(x0, data_loader, model, criterion)
f_x0 = -f_x0
logging.info('min loss f_x0 = {loss:.4f}'.format(loss=f_x0))
# find the minimum
if 0==manifolds:
x_min = np.reshape(x0 - epsilon * (np.abs(x0) + 1), (x0.shape[0], 1))
x_max = np.reshape(x0 + epsilon * (np.abs(x0) + 1), (x0.shape[0], 1))
bounds = np.concatenate([x_min, x_max], 1)
func = lambda x: get_minus_cross_entropy(x, data_loader, model, criterion, training=True)
init_guess = x0
else:
warnings.warn("Small manifolds may not be able to explore the space.")
assert(manifolds<=x0.shape[0])
#transformer = rp.GaussianRandomProjection(n_components=manifolds)
#transformer.fit(np.random.rand(manifolds, x0.shape[0]))
#A_plus = transformer.components_
#A = np.linalg.pinv(A_plus)
A_plus = np.random.rand(manifolds, x0.shape[0])*2.-1.
# normalize each column to unit length
A_plus_norm = np.linalg.norm(A_plus, axis=1)
A_plus = A_plus / np.reshape(A_plus_norm, (manifolds,1))
A = np.linalg.pinv(A_plus)
abs_bound = epsilon * (np.abs(np.dot(A_plus, x0))+1)
abs_bound = np.reshape(abs_bound, (abs_bound.shape[0], 1))
bounds = np.concatenate([-abs_bound, abs_bound], 1)
def func(y):
floss, fg = get_minus_cross_entropy(x0 + np.dot(A, y), data_loader, model, criterion, training=True)
return floss, np.dot(np.transpose(A), fg)
#func = lambda y: get_minus_cross_entropy(x0+np.dot(A, y), data_loader, model, criterion, training=True)
init_guess = np.zeros(manifolds)
#rand_selections = (np.random.rand(bounds.shape[0])+1e-6)*0.99
#init_guess = np.multiply(1.-rand_selections, bounds[:,0])+np.multiply(rand_selections, bounds[:,1])
minimum_x, f_x, d = sciopt.fmin_l_bfgs_b(func, init_guess, maxiter=10, bounds=list(bounds), disp=1, iprint=101)
#factr=10.,
#pgtol=1.e-12,
f_x = -f_x
logging.info('max loss f_x = {loss:.4f}'.format(loss=f_x))
sharpness = (f_x - f_x0)/(1+f_x0)*100
print(sharpness)
# recover the model
x0 = torch.from_numpy(x0).float()
x0 = x0.cuda()
x_start = 0
for p in model.parameters():
psize = p.data.size()
peltnum = 1
for s in psize:
peltnum *= s
x_part = x0[x_start:x_start + peltnum]
p.data = x_part.view(psize)
x_start += peltnum
return sharpness
Which was taken from this repository: https://github.com/wenwei202/smoothout/blob/master/measure_sharpness.py
I'm concerned about exact accuracy.
Upvotes: 2
Views: 1412
Reputation: 429
First, l-bfgs-b will only give a global minimum for a convex function.
the message
CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
is the normal convergence message.
The warning you are getting says that there are a lot of function/gradient evaluations in the line search - this can often happen when you use l-bfgs-b on non convex functions. So if the thing you're minimizing is non convex (and it seems like it might be just by glancing at the code), I would say this is normal.
Upvotes: 1