Reputation: 21
I am learning machine learning from coursera. But I'm little confused between gradient descent and cost function. When and where I should use those?
Upvotes: 2
Views: 785
Reputation: 2112
J(ϴ) is minimized by trial and error approach i.e. trying lot of values and then checking the output. So in practice this means that this work is done by hand and is time consuming.
Gradient Descent basically just does what J(ϴ) does but in a automated way — change the theta values, or parameters, bit by bit, until we hopefully arrived a minimum. This is an iterative method where the model moves to the direction of steepest descent i.e. the optimal value of theta.
Why use Gradient descent? it is easy to implement and is generic optimization technique so will work even if you change your model. It is also better to use GD if you have a lot of features because in this case, normal J(ϴ) computation becomes very expensive.
Upvotes: 1
Reputation: 1555
Gradient Descent requires a cost function(there are many types of cost functions). One common function that is often used is mean squared error, which measure the difference between the estimator (the dataset) and the estimated value (the prediction).
We need this cost function because we want to minimize it. Minimizing any function means finding the deepest valley in that function. Keep in mind that, the cost function is used to monitor the error in predictions of an ML model. So minimizing this, basically means getting to the lowest error value possible or increasing the accuracy of the model. In short, We increase the accuracy by iterating over a training data set while tweaking the parameters(the weights and biases) of our model.
In short, the whole point of Gradient descent is to minimize the cost function
Upvotes: 0