Reputation: 4766
Suppose we have an embedding matrix of 10 vectors with dimension of 100, and we impose max_norm=1
:
x = Embedding(num_embeddings=10, embedding_dim=100, max_norm=1)
In principle, every embedding should have norm less or equal to 1. However, when I print the vector norms, I get values much greater than 1:
for w in x.weight:
print(torch.norm(w))
> tensor(11.1873, grad_fn=<CopyBackwards>)
> tensor(10.5264, grad_fn=<CopyBackwards>)
> tensor(9.6809, grad_fn=<CopyBackwards>)
> tensor(9.7507, grad_fn=<CopyBackwards>)
> tensor(10.7940, grad_fn=<CopyBackwards>)
> tensor(11.4134, grad_fn=<CopyBackwards>)
> tensor(9.7021, grad_fn=<CopyBackwards>)
> tensor(10.4027, grad_fn=<CopyBackwards>)
> tensor(10.1210, grad_fn=<CopyBackwards>)
> tensor(10.4552, grad_fn=<CopyBackwards>)
Any particular reason why this happens and how to fix it?
Upvotes: 4
Views: 1728
Reputation: 1303
The max_norm
argument bounds the norm of the embedding, but not the norm of the weights.
To better understand this, you can run the following example:
from torch import LongTensor, norm
from torch.nn import Embedding
sentences = LongTensor([[1,2,4,5],[4,3,2,9]])
embedding = Embedding(num_embeddings=10, embedding_dim=100, max_norm=1)
for sentence in embedding(sentences):
for word in sentence:
print(norm(word))
This works by dividing each weight in the embedding vector by the norm of the embedding vector itself, and multiplying it by max_norm
. In your example max_norm=1
, hence it's equivalent to dividing by the norm.
To answer the question you asked in the comment, you can obtain the embedding of a sentence (vector containing word indexes taken from your dictionary), with embedding(sentences)
, the norm using the 2 for
loops above.
Upvotes: 4