Belphegor
Belphegor

Reputation: 4766

Embedding in PyTorch creates embedding with norm larger than max_norm

Suppose we have an embedding matrix of 10 vectors with dimension of 100, and we impose max_norm=1:

x = Embedding(num_embeddings=10, embedding_dim=100, max_norm=1)

In principle, every embedding should have norm less or equal to 1. However, when I print the vector norms, I get values much greater than 1:

for w in x.weight: 
    print(torch.norm(w))

> tensor(11.1873, grad_fn=<CopyBackwards>)
> tensor(10.5264, grad_fn=<CopyBackwards>)
> tensor(9.6809, grad_fn=<CopyBackwards>)
> tensor(9.7507, grad_fn=<CopyBackwards>)
> tensor(10.7940, grad_fn=<CopyBackwards>)
> tensor(11.4134, grad_fn=<CopyBackwards>)
> tensor(9.7021, grad_fn=<CopyBackwards>)
> tensor(10.4027, grad_fn=<CopyBackwards>)
> tensor(10.1210, grad_fn=<CopyBackwards>)
> tensor(10.4552, grad_fn=<CopyBackwards>)

Any particular reason why this happens and how to fix it?

Upvotes: 4

Views: 1728

Answers (1)

gionni
gionni

Reputation: 1303

The max_norm argument bounds the norm of the embedding, but not the norm of the weights.

To better understand this, you can run the following example:

from torch import LongTensor, norm
from torch.nn import Embedding

sentences = LongTensor([[1,2,4,5],[4,3,2,9]])
embedding = Embedding(num_embeddings=10, embedding_dim=100, max_norm=1)
for sentence in embedding(sentences):
    for word in sentence:
        print(norm(word))

This works by dividing each weight in the embedding vector by the norm of the embedding vector itself, and multiplying it by max_norm. In your example max_norm=1, hence it's equivalent to dividing by the norm.

To answer the question you asked in the comment, you can obtain the embedding of a sentence (vector containing word indexes taken from your dictionary), with embedding(sentences), the norm using the 2 for loops above.

Upvotes: 4

Related Questions