Reputation: 1099
I am using the GPT2 pre-trained model for a research project and when I load the pre-trained model with the following code,
from transformers.models.gpt2.modeling_gpt2 import GPT2Model
gpt2 = GPT2Model.from_pretrained('gpt2')
I get the following warning message:
Some weights of GPT2Model were not initialized from the model checkpoint at gpt2 and are newly initialized: ['h.0.attn.masked_bias', 'h.1.attn.masked_bias', 'h.2.attn.masked_bias', 'h.3.attn.masked_bias', 'h.4.attn.masked_bias', 'h.5.attn.masked_bias', 'h.6.attn.masked_bias', 'h.7.attn.masked_bias', 'h.8.attn.masked_bias', 'h.9.attn.masked_bias', 'h.10.attn.masked_bias', 'h.11.attn.masked_bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
From my understanding, it says that the weights of the above layers are not initialized from the pre-trained model. But we all know that attention layers ('attn') are so important in GPT2 and if we can not have their actual weights from the pre-trained model, then what is the point of using a pre-trained model?
Why is this happening and how can I fix it?
Upvotes: 3
Views: 2198