Reputation: 999
I want to mask the all the zeros in the score matrix with -np.inf
, but I can only get part of zeros masked, looked like
you see in the upper right corner there are still zeros that didn't get masked with -np.inf
Here's my codes:
q = torch.Tensor([np.random.random(10),np.random.random(10),np.random.random(10), np.random.random(10), np.zeros((10,1)), np.zeros((10,1))])
k = torch.Tensor([np.random.random(10),np.random.random(10),np.random.random(10), np.random.random(10), np.zeros((10,1)), np.zeros((10,1))])
scores = torch.matmul(q, k.transpose(0,1)) / math.sqrt(10)
mask = torch.Tensor([1,1,1,1,0,0])
mask = mask.unsqueeze(1)
scores = scores.masked_fill(mask==0, -np.inf)
Maybe the mask is wrong?
Upvotes: 4
Views: 15461
Reputation: 67
or even by changing your code a little bit, it'll work
import math
q = torch.Tensor([np.random.random(10),np.random.random(10),np.random.random(10), np.random.random(10), np.zeros((10,1)), np.zeros((10,1))])
k = torch.Tensor([np.random.random(10),np.random.random(10),np.random.random(10), np.random.random(10), np.zeros((10,1)), np.zeros((10,1))])
scores = torch.matmul(q, k.transpose(0,1)) / math.sqrt(10)
mask = torch.Tensor([1,1,1,1,0,0])
mask2 = mask.unsqueeze(1)
scores = scores.masked_fill(mask2==0, -np.inf)
mask = mask.unsqueeze(0)
scores = scores.masked_fill(mask==0, -np.inf)
scores
Upvotes: 0
Reputation: 51
Ying your code is right and the output is showing the right behaviour. Currently your mask has shape [6,1] ands hence it masks the last two elements in each column first.
>>> mask = torch.Tensor([1,1,1,1,0,0])
>>> mask.shape
torch.Size([6])
>>> mask = mask.unsqueeze(1)
>>> mask.shape
torch.Size([6, 1])
Upvotes: 1
Reputation: 51
In mujjiga's code, scores tensor is itself used as mask and hence it will replace all 0's as -inf though that is not the usual intended use of a mask. A mask is generally independent of the tensor which one would want to mask.
Upvotes: 4
Reputation: 16856
Your mask is wrong. Try
scores = scores.masked_fill(scores == 0, -np.inf)
scores
now looks like
tensor([[1.4796, 1.2361, 1.2137, 0.9487, -inf, -inf],
[0.6889, 0.4428, 0.6302, 0.4388, -inf, -inf],
[0.8842, 0.7614, 0.8311, 0.6431, -inf, -inf],
[0.9884, 0.8430, 0.7982, 0.7323, -inf, -inf],
[ -inf, -inf, -inf, -inf, -inf, -inf],
[ -inf, -inf, -inf, -inf, -inf, -inf]])
Upvotes: 10