Reputation: 1
I want to create an upper triangular tensor in pytorch
which I want that the lower half of the upper triangular tensor constant zeros. And the lower half of the upper triangular tensor have no grad.
When I use torch.triu()
to get upper triangular tensor, the lower half of the upper triangular tensor have grad which means that such "zeros" are not constant.
So how to get an upper triangular tensor and let the lower half of the upper triangular tensor constant zeros?
import torch
a=torch.randn(5,5)
c=torch.randn(1,5)
b=torch.triu(a).requires_grad_()
loss=torch.matmul(c,b)
loss=loss.sum()
loss.backward()
print(b.grad)
Upvotes: 0
Views: 1758
Reputation: 22224
It does appear that torch.triu()
gives you an upper triangular matrix and the gradients appear to be correct.
For example, lets say that
x = torch.randn(5, 5, requires_grad=True)
produces
tensor([[ 0.1907, -0.0990, 1.0373, 0.3676, -0.2752],
[ 1.8987, 1.0265, -0.1133, 0.1476, -3.5617],
[ 1.2581, 0.2860, -1.9215, -0.7674, -0.1687],
[ 0.1559, -0.9870, -0.6928, 0.1487, 0.3346],
[ 0.6317, -0.4915, 1.2506, 0.8678, 0.6367]], requires_grad=True)
Then we can take the upper triangular part of x
using
y = torch.triu(x)
and y
will then be
tensor([[ 0.1907, -0.0990, 1.0373, 0.3676, -0.2752],
[ 0.0000, 1.0265, -0.1133, 0.1476, -3.5617],
[ 0.0000, 0.0000, -1.9215, -0.7674, -0.1687],
[ 0.0000, 0.0000, 0.0000, 0.1487, 0.3346],
[ 0.0000, 0.0000, 0.0000, 0.0000, 0.6367]], grad_fn=<TriuBackward>)
The claim that "the lower half of the upper triangular tensor have grad which means that such zeros are not constant" indicates there may be a little confusion about what is expected of the gradients here.
The only implication as far as gradients are concerned is that no matter what we do with y
, the lower part of x
will not have any impact on the final result since it has been effectively multiplied by a constant zero. Therefore the derivative of any function resulting from y
with respect to any of the lower components of x
will be zero. We find that this is indeed the case. For example
y.sum().backward()
populates x.grad
with the gradient of y.sum()
w.r.t. x
which is correctly reported as
# x.grad
tensor([[1., 1., 1., 1., 1.],
[0., 1., 1., 1., 1.],
[0., 0., 1., 1., 1.],
[0., 0., 0., 1., 1.],
[0., 0., 0., 0., 1.]])
Upvotes: 1