IwantToLearnQuickly
IwantToLearnQuickly

Reputation: 21

Linear regression gradient descent using C#

I'm taking the Coursera machine learning course right now and I cant get my gradient descent linear regression function to minimize. I use: one dependent variable, an intercept, and four values of x and y, therefore the equations are fairly simple. The final value of the Gradient Decent equation varies wildly depending on the initial values of alpha and beta and I cant figure out why. I've only been coding for about two weeks, so my knowledge is limited to say the least, please keep this in mind if you take the time to help.

using System;
namespace LinearRegression
{
class Program
{
    static void Main(string[] args)
    {
        Random rnd = new Random();
        const int N = 4;

        //We randomize the inital values of alpha and beta
        double theta1 = rnd.Next(0, 100);
        double theta2 = rnd.Next(0, 100);

        //Values of x, i.e the independent variable
        double[] x = new double[N] { 1, 2, 3, 4 };
        //VAlues of y, i.e the dependent variable
        double[] y = new double[N] { 5, 7, 9, 12 };
        double sumOfSquares1;
        double sumOfSquares2;
        double temp1;
        double temp2;
        double sum;
        double learningRate = 0.001;
        int count = 0;

        do
        {
            //We reset the Generalized cost function, called sum of squares 
            //since I originally used SS to 
            //determine if the function was minimized
            sumOfSquares1 = 0;
            sumOfSquares2 = 0;
            //Adding 1 to counter for each iteration to keep track of how 
            //many iterations are completed thus far
            count += 1;

            //First we calculate the Generalized cost function, which is
            //to be minimized
            sum = 0;
            for (int i = 0; i < (N - 1); i++)
            {
                sum += Math.Pow((theta1 + theta2 * x[i] - y[i]), 2);
            }
            //Since we have 4 values of x and y we have 1/(2*N) = 1 /8 = 0.125
            sumOfSquares1 = 0.125 * sum;

            //Then we calcualte the new alpha value, using the derivative of 
            //the cost function. 
            sum = 0;
            for (int i = 0; i < (N - 1); i++)
            {
                sum += theta1 + theta2 * x[i] - y[i];
            }
            //Since we have 4 values of x and y we have 1/(N) = 1 /4 = 0.25
            temp1 = theta1 - learningRate * 0.25 * sum;

            //Same for the beta value, it has a different derivative
            sum = 0;
            for (int i = 0; i < (N - 1); i++)
            {
                sum += (theta1 + theta2 * x[i]) * x[i] - y[i];
            }
            temp2 = theta2 - learningRate * 0.25 * sum;

            //WE change the values of alpha an beta at the same time, otherwise the 
            //function wont work                
            theta1 = temp1;
            theta2 = temp2;

            //We then calculate the cost function again, with new alpha and beta values 
            sum = 0;
            for (int i = 0; i < (N - 1); i++)
            {
                sum += Math.Pow((theta1 + theta2 * x[i] - y[i]), 2);
            }
            sumOfSquares2 = 0.125 * sum;

            Console.WriteLine("Alpha: {0:N}", theta1);
            Console.WriteLine("Beta: {0:N}", theta2);
            Console.WriteLine("GCF Before: {0:N}", sumOfSquares1);
            Console.WriteLine("GCF After: {0:N}", sumOfSquares2);
            Console.WriteLine("Iterations: {0}", count);
            Console.WriteLine(" ");

        } while (sumOfSquares2 <= sumOfSquares1 && count < 5000);
        //we end the iteration cycle once the generalized cost function
        //cannot be reduced any further or after 5000 iterations            
        Console.ReadLine();
    }
}
}

Upvotes: 2

Views: 3145

Answers (1)

P&#233;ter Szilv&#225;si
P&#233;ter Szilv&#225;si

Reputation: 2079

There are two bugs in the code.

  • First, I assume that you would like to iterate through all the element in the array. So rework the for loop like this: for (int i = 0; i < N; i++)
  • Second, when updating the theta2 value the summation is not calculated well. According to the update function it should be look like this: sum += (theta1 + theta2 * x[i] - y[i]) * x[i];

enter image description here

Why the final values depend on the initial values?

Because the gradient descent update step is calculated from these values. If the initial values (Starting Point) are too big or too small, then it will be too far away from the final values (Final Value). You could solve this problem by:

  1. Increasing the iteration steps (e.g. 5000 to 50000): gradient descent algorithm has more time to converge.
  2. Decreasing the learning rate (e.g. 0.001 to 0.01): gradient descent update steps are bigger, therefore it converges faster. Note: if the learning rate is too small, then it is possible to step through the global minimum.

The slope (theta2) is around 2.5 and the intercept (theta1) is around 2.3 for the given data. I have created a github project to fix your code and i have also added a shorter solution using LINQ. It is 5 line of codes. If you are curious check it out here.

Upvotes: 2

Related Questions