Paler
Paler

Reputation: 349

Batch gradient descent algorithm does not converge

I'm trying to implement batch grandient descent algorithm for my machine learning homework. I have a training set, whose x value is around 10^3 and y value is around 10^6. I'm trying to find the value of [theta0, theta1] which makes y = theta0 + theta1 * x converge. I set the learning rate to 0.0001 and maximum interation to 10. Here's my code in Qt.

QVector<double> gradient_descent_batch(QVector<double> x, QVector<double>y)
{
    QVector<double> theta(0);
    theta.resize(2);

    int size = x.size();

    theta[1] = 0.1;
    theta[0] = 0.1;

    for (int j=0;j<MAX_ITERATION;j++)
    {
        double dJ0 = 0.0;
        double dJ1 = 0.0;

        for (int i=0;i<size;i++)
        {
            dJ0 += (theta[0] + theta[1] * x[i] - y[i]);
            dJ1 += (theta[0] + theta[1] * x[i] - y[i]) * x[i];
        }

        double theta0 = theta[0];
        double theta1 = theta[1];
        theta[0] = theta0 - LRATE * dJ0;
        theta[1] = theta1 - LRATE * dJ1;

        if (qAbs(theta0 - theta[0]) < THRESHOLD && qAbs(theta1 - theta[1]) < THRESHOLD)
            return theta;
    }

    return theta;
}

I print the value of theta every interation, and here's the result.

QVector(921495, 2.29367e+09) 
QVector(-8.14503e+12, -1.99708e+16) 
QVector(7.09179e+19, 1.73884e+23) 
QVector(-6.17475e+26, -1.51399e+30) 
QVector(5.3763e+33, 1.31821e+37) 
QVector(-4.68109e+40, -1.14775e+44) 
QVector(4.07577e+47, 9.99338e+50) 
QVector(-3.54873e+54, -8.70114e+57) 
QVector(3.08985e+61, 7.57599e+64) 
QVector(-2.6903e+68, -6.59634e+71) 

I seems that theta will never converge. I follow the solution here to set learning rate to 0.00000000000001 and maximum iteration to 20. But it seems will not converge. Here's the result.

QVector(0.100092, 0.329367) 
QVector(0.100184, 0.558535) 
QVector(0.100276, 0.787503) 
QVector(0.100368, 1.01627) 
QVector(0.10046, 1.24484) 
QVector(0.100552, 1.47321) 
QVector(0.100643, 1.70138) 
QVector(0.100735, 1.92936) 
QVector(0.100826, 2.15713) 
QVector(0.100918, 2.38471) 
QVector(0.101009, 2.61209) 
QVector(0.1011, 2.83927) 
QVector(0.101192, 3.06625) 
QVector(0.101283, 3.29303) 
QVector(0.101374, 3.51962) 
QVector(0.101465, 3.74601) 
QVector(0.101556, 3.9722) 
QVector(0.101646, 4.1982) 
QVector(0.101737, 4.424) 
QVector(0.101828, 4.6496) 

What's wrong?

Upvotes: 0

Views: 483

Answers (1)

yfy
yfy

Reputation: 48

So firstly your algorithm seems fine except that you should divide LRATE by size;

theta[0] = theta0 - LRATE * dJ0 / size;
theta[1] = theta1 - LRATE * dJ1 / size;

What I would suggest you should calculate cost function and monitor it;

Cost function

Your cost should be decreasing on every iteration. If its bouncing back and forward you are using a large value of learning rate. I would suggest you to use 0.01 and do 400 iterations.

Upvotes: 1

Related Questions