Reputation: 2161
I know offsets meaning when it has two numbers, but what does it mean when more than two numbers,for example:
weight = torch.FloatTensor([[1, 2, 3], [4, 5, 6]])
embedding_sum = nn.EmbeddingBag.from_pretrained(weight, mode='sum')
print(list(embedding_sum.parameters()))
input = torch.LongTensor([0,1])
offsets = torch.LongTensor([0,1,2,1])
print(embedding_sum(input, offsets))
the result is :
[Parameter containing:
tensor([[1., 2., 3.],
[4., 5., 6.]])]
tensor([[1., 2., 3.],
[4., 5., 6.],
[0., 0., 0.],
[0., 0., 0.]])
who can help me?
Upvotes: 3
Views: 3223
Reputation: 1304
import torch
import torch.nn as nn
weight = torch.FloatTensor([[1, 2, 3], [4, 5, 6]])
embedding_sum = nn.EmbeddingBag.from_pretrained(weight, mode='sum')
print(embedding_sum.weight)
""" output
Parameter containing:
tensor([[1., 2., 3.],
[4., 5., 6.]])
"""
input = torch.LongTensor([0, 1])
offsets = torch.LongTensor([0, 1, 2, 1])
According to these offsets you will get the following samples
"""
sample_1: input[0:1] # tensor([0])
sample_2: input[1:2] # tensor([1])
sample_3: input[2:1] # tensor([])
sample_4: input[1:] # tensor([1])
"""
Embedding the samples above
# tensor([0]) => lookup 0 => embedding_sum.weight[0] => [1., 2., 3.]
# tensor([1]) => lookup 1 => embedding_sum.weight[1] => [4., 5., 6.]
# tensor([]) => empty bag => [0., 0., 0.]
# tensor([1]) => lookup 1 => embedding_sum.weight[1] => [4., 5., 6.]
print(embedding_sum(input, offsets))
""" output
tensor([[1., 2., 3.],
[4., 5., 6.],
[0., 0., 0.],
[4., 5., 6.]])
"""
One more example:
input = torch.LongTensor([0, 1])
offsets = torch.LongTensor([0, 1, 0])
According to these offsets you will get the following samples
"""
sample_1: input[0:1] # tensor([0])
sample_2: input[1:0] # tensor([])
sample_3: input[0:] # tensor([0, 1])
"""
Embedding the samples above
# tensor([0]) => lookup 0 => embedding_sum.weight[0] => [1., 2., 3.]
# tensor([]) => empty bag => [0., 0., 0.]
# tensor([0, 1]) => lookup 0 and 1 then reduce by sum
# => embedding_sum.weight[0] + embedding_sum.weight[1] => [5., 7., 9.]
print(embedding_sum(input, offsets))
""" output
tensor([[1., 2., 3.],
[0., 0., 0.],
[5., 7., 9.]])
"""
Upvotes: 1
Reputation: 6125
As shown in the source code,
return F.embedding(
input, self.weight, self.padding_idx, self.max_norm,
self.norm_type, self.scale_grad_by_freq, self.sparse)
It uses the functional embedding bag, which explains the offsets
parameters as
offsets (LongTensor, optional) – Only used when input is 1D. offsets determines the starting index position of each bag (sequence) in input.
In the EmbeddingBag
docs:
If input is 1D of shape (N), it will be treated as a concatenation of multiple bags (sequences). offsets is required to be a 1D tensor containing the starting index positions of each bag in input. Therefore, for offsets of shape (B), input will be viewed as having B bags. Empty bags (i.e., having 0-length) will have returned vectors filled by zeros.
The last statement ("Empty bags (i.e., having 0-length) will have returned vectors filled by zeros.") explains the zero vectors in your resulting tensor.
Upvotes: 1