Reputation: 1
I'm new to the inner workings of automatic differentiation and came across some papers and slides that state that vector-Jacobian products can be computed in linear time using automatic differentiation. Specifically written:
$e^\top ( \frac{\partial f}{\partial x} )$
can be computed for any e in O(N)
.
The Jacobian is N-by-N
. I would think it's N^2
, but I don't fully understand how automatic differentiation can reduce the time complexity.
Upvotes: 0
Views: 734
Reputation: 698
I don't fully understand how automatic differentiation can reduce the time complexity.
I'll assume you understand how backpropagation and/or reverse-mode automatic differentiation works for a scalar function y = f(x), where f : R^n -> R. Suppose computing f takes time O(1).
If you remember from backpropagation, you start by setting dy/dy = 1, then computing dy/dx using the chain rule. What you're doing here is you are computing the product of the vector [1.0] with the Jacobian, which is a row vector, to get the vector-Jacobian product which is a column vector. So you are computing the VJP, your vector is just a length-1 vector. You do this in time O(1).
Now suppose that I'm computing the derivative of a multivariate function y = f(x), where f : R^n -> R^m. Then the Jacobian is indeed m x n, and computing the full Jacobian with reverse mode AD would take time O(m). However, you can compute the vector-Jacobian product in time O(1) with a vector v by setting the values of dy_i/dy_i = v_i, for i={1,...,m}. Then you just backpropagate like normal, and get the VJP in a single backwards pass.
Upvotes: 1
Reputation: 1
If you have an underlying assumption that the hidden space is fixed, then it is linear. Suppose that your function f is a composition of matrix transformations where the fixed matrix f1 is H-by-N, f2 is H-by-H, and f3 is N-by-H. Then for N-dimensional vector e, we can compute the computation as f1*e , then apply f2, then apply f3. This scales linearly with N but not with the hidden space H.
Upvotes: 0