utlesh singh
utlesh singh

Reputation: 35

How to solve logistic regression using gradient Descent?

I was solving a exercise of a online course form coursera on machine learning. The problem statement is :

Suppose that a high school has a dataset representing 40 students who were admitted to college and 40 students who were not admitted. Each ( x(i), y(i) ) training example contains a student's score on two standardized exams and a label of whether the student was admitted.

Our task is to build a binary classification model that estimates college admission chances based on a student's scores on two exams. In the training data,

a. The first column of your x array represents all Test 1 scores, and the second column represents all Test 2 scores.

b. The y vector uses '1' to label a student who was admitted and '0' to label a student who was not admitted.

I have solved it by using predefined function named fminunc. Now , i am solving it by using gradient descent but my graph of cost vs number of iteration is not conversing i.e cost function value is not decreasing with number of iteration . My theta value is also not matching with the answer that should i get.

theta value that i got :

[-0.085260    0.047703    -0.022851]

theta value that i should get (answer) :

[-16.38    0.1483    0.1589]

My source code :

clear ; close all; clc
x = load('/home/utlesh/Downloads/ex4x.txt'); 
y = load('/home/utlesh/Downloads/ex4y.txt');
theta = [0,0,0];
alpha = 0.00002;
a = [0,0,0];
m = size(x,1);

x = [ones(m,1) x];
n = size(x,2);
y_hyp = y*ones(1,n);

for kk = 1:100000
  hyposis = 1./(1 + exp(-(x*theta')));
  x_hyp = hyposis*ones(1,n);
  theta = theta - alpha*1/m*sum((x_hyp - y_hyp).*x);
  a(kk,:) = theta ;
end

cost = [0];
for kk = 1:100000
  h = 1./(1 + exp(-(x*a(kk,:)')));
  cost(kk,:) = sum(-y .* log(h) - (1 - y) .* log(1 - h));
end

x_axis = [0];
for kk = 1:100000
  x_axis(kk,:) = kk;
end

plot(x_axis,cost);

The graph that i got looks like that of 1/x;

Please tell me where i am doing mistake . If there is anything that i misunderstood please let me know .

Upvotes: 0

Views: 468

Answers (1)

farhankhwaja
farhankhwaja

Reputation: 159

What I can see missing is the usage of learning rate and weights. The weights can be adjusted in two modes online and batch.

The weights should be randomly assigned values between [-0.01,0.01]. I did an exercise as a part of my HW during my Master's. Below is the snippet:

assign values to weights between [-0.01,0.01] i.e. no. of weight values will be, no. of features + 1:

weights = -.01 + 0.02 * rand(3,1);
learnRate = 0.001;

Here running the code for set number of iterations: (It converged in 100 iterations also).

while iter < 100
    old_output = new_output;
    delta = zeros(cols-1,1);
        for t = 1:rows
           input = 0;
           for j = 1:cols-1 
               input = input + weights(j) * numericdata(t,j);
           end
           new_output(t) = (1 ./ (1 + exp(-input)));
           for j = 1:cols-1 
               delta(j) = delta(j) + (numericdata(t,4)-new_output(t)) * numericdata(t,j);
           end
        end

#Adjusting weights (Batch Mode):

    for j=1:cols-1
        weights(j) = weights(j) + learnRate * (delta(j));
    end

    error = abs(numericdata(:,4) - new_output);
    errorStr(i) = (error(:));
    error = 0;
    iter = iter + 1;
    i = i + 1;
end

Also, I had a talk with my professor, while studying it. He said, if the dataset given has the property to converge then you will see that when you randomly run it for different number of iterations.

Upvotes: 0

Related Questions