Stochastic gradient descent and performance

Question

I'm trying to train a classifier with the MNIST set (a set of handwritten digits) and I want to implement a stochastic gradient descent algorithm. Here is the function I wrote:

def Stochastic_gradient_descent(theta, y, X, alpha, nIter):
    costs = numpy.zeros([nIter, 1])
    N = y.size
    for i in range(nIter):
        random = randint(0,49999)
        theta -= alpha*(tls.h(theta, X)[random] - y[random])*X[[random],:].T
        costs[i] = (1/N)*tls.cost(theta, y, X)
    return theta, costs

alpha is the length of the step

h is the sigmoid function of transpose(theta).X

X is a 50000*785 where 50000 is the size of the training set and 785 = (size of my image) + 1 (for the constant theta0)

This function runs in roughly 9 seconds for 100 iterations (nIter), so for 100*1*785 multiplications. The classifiers I found are satisfying. I wanted to compare this running time with a gradient descent algorithm where:

theta -= alpha * (1/N) * (numpy.dot((tls.h(theta, X) - y).T, X)).T

This function runs in roughly 12 seconds for 100 iterations (nIter), so for 100*50000*785 multiplications as (h(theta,X)-y) is a 50000*1 vector. The classifiers I found are also satisfying but I am surprised because this code is not much slower than the first one. I understand vectorization plays an important role in the dot function but I would have expected a worse performance. Is there a way to improve the performance of my stochastic gradient descent?

Thank you for your help.

Stochastic gradient descent and performance

Answers (1)

Related Questions