Reputation: 1579
During the course of my training process, I tend to use a lot of calls to torch.cat()
and copying tensors into new tensors. How are these operations handled by autograd
? Is the gradient value affected by these operations?
Upvotes: 1
Views: 214
Reputation: 22184
As pointed out in the comments, cat
is a mathematical function. For example we could write the following (special case) definition of cat
in more traditional mathematical notation as
The Jacobian of this function w.r.t. either of its inputs can be expressed as
Since the Jacobian is well defined you can, of course, apply back-propagation.
In reality you generally wouldn't define these operations with such notation, and a general definition of the cat operation used by pytorch in such a way would be cumbersome.
That said, internally autograd uses backward algorithms that take into account the gradients of such "index style" operations just like any other function.
Upvotes: 1