sam
sam

Reputation: 427

Neural net optimization failing (using Scipy fmin_cg)

Just a bit of context: I'm attempting to implement a 3 layer neural network (1 hidden layer) for image classification on the Cifar-10 dataset. I've implemented backpropagation and originally tried training the network simply using gradient descent, but my cost was plateauing around 40 or so (which effectively classifies new images at the same rate of guessing randomly; virtually pointless).

I then attempted to train the network using the scipy.optimize.fmin_cg function. I passed the unrolled weights into the function, and my backpropagation function returns the same size vector for the gradients, which satisfy the input requirements for the function.

My implementation of the function looks like the following:

scipy.optimize.fmin_cg(cost, iw, fprime=fit_slim)

Where the fit and fit_slim functions are the following:

def fit(X, Y, w, l, predict=False, x=None):
    w_grad = ([np.mat(np.zeros(np.shape(w[i]))) 
              for i in range(len(w))])
    for i in range(len(X)):
        x = x if predict else X[i]
        y = Y[i]
        # forward propagate
        a = x
        a_s = []
        for j in range(len(w)):
            a = np.mat(np.append(1, a)).T
            a_s.append(a)
            z = w[j] * a
            a = sigmoid(z)
        if predict: return a
        # backpropagate
        delta = a - y.T
        w_grad[-1] += delta * a_s[-1].T
        for j in reversed(range(1, len(w))):
            delta = delta[1:] if j != len(w)-1 else delta
            delta = np.multiply(w[j].T*delta, s_prime(a_s[j]))
            w_grad[j-1] += (delta[1:] * a_s[j-1].T)
    # regularization
    for i in range(len(w)):
        w_grad[i] /= len(X)
        w_grad[i][:,1:] += (l/len(X)) * w[i][:,1:]
    return flatten(w_grad).T

def fit_slim(iw):
    iw = shape_back(iw)
    return fit(X, Y, iw, l)

And the cost function is:

def cost(iw):
    J = 0
    m = len(X)
    iw = shape_back(iw)
    for i in range(m):
        h = fit(X, Y, iw, l, True, X[i])
        J += ((1.0/m)*(np.sum((np.multiply(-Y[i],np.log(h))-
              np.multiply((1-Y[i]),np.log(1-h))).flatten())))
    for i in range(len(w)):
        J += np.sum(((l/(2.0*m))*np.power(w[i],2)).flatten())
    return J

The iw variable is the unrolled weights into a long vector, and the shape_back function simply reshapes iw back into its original matrix shape for use in the fit and cost functions.

The first issue I face is the fact my fit function takes forever to run a single iteration. And by forever, I mean about a minute, which seems very slow. Nevertheless, I've let it run until the cost plateaus at around 40, as I mentioned, which is still a very high cost. That said, implementing an alternative optimization technique seemed reasonable to me, and I settled the fmin_cg function.

When I run it, I receive the following error:

  File "image_net.py", line 143, in <module>
    print scipy.optimize.fmin_cg(cost, iw, fprime=fit_slim, maxiter=2,  callback=callback)
  File "/Users/samgriesemer/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py", line 1092, in fmin_cg
    res = _minimize_cg(f, x0, args, fprime, callback=callback, **opts)
  File "/Users/samgriesemer/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py", line 1156, in _minimize_cg
    deltak = numpy.dot(gfk, gfk)
ValueError: shapes (616610,1) and (616610,1) not aligned: 1 (dim 1) != 616610 (dim 0)

It seems to me that the function is attempting to take the dot product of the same vector, which doesn't make any sense to me.

So to recap my question, I have two issues.

1) Is there anything I can do to better optimize my fit function? My data set has 10,000 examples, so I understand that it takes time to loop through all of them, but it don't understand why, even after many iterations, my cost is still very high.

2) Why am I receiving an error when running the fmin_cg function? The arguments I'm passing into the function are the same sized vectors. I don't understand why it would attempt to take the dot product of the same sized vector within the function.

Many thanks to anyone who can shed some light on these issues/misunderstandings.

Upvotes: 1

Views: 262

Answers (1)

ev-br
ev-br

Reputation: 26040

It seems to me that the function is attempting to take the dot product of the same vector, which doesn't make any sense to me.

This is not how numpy.dot works. The problem is exactly what the error message says: it tries to perform a matrix multiplication and fails because the dimensions do not match.

Notice that for arrays which one could think of as "one-dimensional", numpy distinguishes between shapes (n,), (n, 1) and (1, n): only the first one is one-dimensional for numpy, and it's not interpreted as a row or a column vector.

>>> a = np.ones(3)      # a 1D array
>>> np.dot(a, a)
3.0
>>> b = a.reshape(-1, 1)   # a column vector
>>> b
array([[ 1.],
       [ 1.],
       [ 1.]])
>>> np.dot(b, b)           # column times column, fails
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: shapes (3,1) and (3,1) not aligned: 1 (dim 1) != 3 (dim 0)
>>> np.dot(b, b.T)        # column times row, i.e. an outer product
array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])
>>> np.dot(b.T, b)        # row times column, but notice the dimensions
array([[ 3.]])            

Upvotes: 1

Related Questions