Petreius
Petreius

Reputation: 35

Stochastic gradient descent and performance

I'm trying to train a classifier with the MNIST set (a set of handwritten digits) and I want to implement a stochastic gradient descent algorithm. Here is the function I wrote:

def Stochastic_gradient_descent(theta, y, X, alpha, nIter):
    costs = numpy.zeros([nIter, 1])
    N = y.size
    for i in range(nIter):
        random = randint(0,49999)
        theta -= alpha*(tls.h(theta, X)[random] - y[random])*X[[random],:].T
        costs[i] = (1/N)*tls.cost(theta, y, X)
    return theta, costs

alpha is the length of the step

h is the sigmoid function of transpose(theta).X

X is a 50000*785 where 50000 is the size of the training set and 785 = (size of my image) + 1 (for the constant theta0)

This function runs in roughly 9 seconds for 100 iterations (nIter), so for 100*1*785 multiplications. The classifiers I found are satisfying. I wanted to compare this running time with a gradient descent algorithm where:

theta -= alpha * (1/N) * (numpy.dot((tls.h(theta, X) - y).T, X)).T

This function runs in roughly 12 seconds for 100 iterations (nIter), so for 100*50000*785 multiplications as (h(theta,X)-y) is a 50000*1 vector. The classifiers I found are also satisfying but I am surprised because this code is not much slower than the first one. I understand vectorization plays an important role in the dot function but I would have expected a worse performance. Is there a way to improve the performance of my stochastic gradient descent?

Thank you for your help.

Upvotes: 2

Views: 585

Answers (1)

Salva Carrión
Salva Carrión

Reputation: 540

As far as I'm concerned, vectorization is the simplest way to improve the performance of SGD. There are some other things you can try. For instance coding a Cython version, using minibatches of several samples (they tend to average the "noise" of the single samples) or simply you can try different stopping criteria using: early-stopping, close-enough-to-zero, threshold-stopping,...

If your aim is to implement some ML learning algorithms or optimizations functions to learn about it as a first contact, then perfect. Keep working. But if you want to work in a professional way you should use the already-optimize (and well-tested)libraries.

P.S. Libraries like Caffe, Torch, Theano, Neon (Nirvana),... have some really complex and magical optimizations that allow them to get some really high performance beside the GPU support.

Benchmark of the ImageNet winner models coded in some of the most popular libraries: https://github.com/soumith/convnet-benchmarks

Upvotes: 1

Related Questions