Why is Gradient Checking Slow For Back Propagation?

Question

I recently learned the algorithm "Gradient Checking" for making sure the derivatives of my Neural Network's Back Propagation are calculated properly.

The course from which I have learned, and many other sources such as this one, claim it to be much slower than calculating derivatives, but I can't seem to find anywhere that explains WHY.

So, why is gradient checking slower than calculating the derivative directly?

How much slower is it?

Lutz Lehmann · Accepted Answer

What you are doing in back-propagation is the backwards mode of automatic/algorithmic differentiation for a function that has a very large number N of inputs and only one output. The "inputs" here mean chiefly the real-number parameters of the nodes of the neural nets, possibly also the input variables of the net.

In the backwards mode you compute the derivatives of all inputs in one pass through the chain of operations. This has the cost of about 3 function evaluations plus the organizational overhead to execute the operations chain backwards and store and access the intermediate results.

In the forward mode for the same situation, which you use for the "gradient checking", independent of if you push forward the AD derivatives or compute divided differences, you will need to compute each derivative individually. The total cost of that is about 2*N function evaluations.

And as N is large, 2*N is much larger than 3.

Why is Gradient Checking Slow For Back Propagation?

Answers (1)

Related Questions