Reputation: 11
I'm studying BERT right now.
I thought BERT limits position embedding as 512 because of the memory problem. However, when I look up the BERT code in hugging face I found this parameter on config.
max_position_embeddings: The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048).
As I understand this, BERT can have 1024, 2048 which are over 512. I don't understand how this is possible.
Could someone explains it in more detail of it?
Upvotes: 1
Views: 5401
Reputation: 11240
When training your BERT model, you can decide on whatever maximum length you want. However, you cannot change the maximum length once the model is trained (or pre-trained). BERT uses learned position embeddings, so embeddings for positions beyond 512 are just not learned.
The memory that Transformers need for sequence processing grows quadratically with the sequence length, so it makes sense to limit the sequence length in advance.
Upvotes: 2