Reputation: 2710
I know the solution but I don't understand how the following equation was translated to code.
Solution
grad = (1/m) * ((sigmoid(X * theta)-y)' * X);
Upvotes: 2
Views: 469
Reputation: 2202
As it's been said, the mathematical expression you've posted is the cost function, whereas the code snippet you show is the gradient.
However, the summation is not missing. Let's break it down.
The gradient of the cost function with respect to the j-th parameter is
With X * theta
you get a vector that contains the dot product of all your data points and your parameter vector.
With sigmoid(X * theta)
you evaluate the sigmoid of each of those dot products.
With X * theta)-y
you get a vector containing the differences between all your predictions and the actual labels.
With sigmoid(X * theta)-y)' * X
you are transposing the vector of sigmoid evaluations and computing its dot product with each of the columns of your data set (i.e. each of the x_j's for each data point).
Think about it for a second, and you'll see how that's exactly the summation in the expression, but evaluated for all the entries of your parameter vector, not just j
.
Upvotes: 1
Reputation: 40889
The original line J(theta) represents the cost function for logistic regression.
The code that you showed, grad = ...
, is the gradient of J(theta) with respect to the parameters; that is, grad
is an implementation of d/dtheta J(theta). The derivative is important because that is used in gradient descent to move the parameters toward their optimal values (to minimize the cost J(theta)).
Below is the formula for the gradient, outlined in red, taken from the first link below. Note that J(theta) is the same as your formula above and h(x) represents the sigmoid function.
The total gradient over all training examples requires a summation over m. In your code for grad
above, you are computing the gradient over one training example due to the omission of the summation; thus, your code is probably computing the gradient for stochastic gradient descent, not full gradient descent.
For more information, you can google for "logistic regression cost function derivative", which leads to these links:
This one in particular has everything you need: http://feature-space.com/2011/10/28/logistic-cost-function-derivative/
These are apparently some lecture notes from Andrew Ng's class on machine learning and logistic regression with gradient descent: http://www.holehouse.org/mlclass/06_Logistic_Regression.html
Explanation of how to compute the derivative step-by-step: https://math.stackexchange.com/questions/477207/derivative-of-cost-function-for-logistic-regression
Upvotes: 1