What is "language modeling head" in BertForMaskedLM

Question

I have recently read about BERT and want to use BertForMaskedLM for fill_mask task. I know about BERT architecture. Also, as far as I know, BertForMaskedLM is built from BERT with a language modeling head on top, but I have no idea about what language modeling head means here. Can anyone give me a brief explanation.

Ashwin Geet D&#39;Sa · Accepted Answer

The BertForMaskedLM, as you have understood correctly uses a Language Modeling(LM) head .

Generally, as well as in this case, LM head is a linear layer having input dimension of hidden state (for BERT-base it will be 768) and output dimension of vocabulary size. Thus, it maps to hidden state output of BERT model to a specific token in the vocabulary. The loss is calculated based on the scores obtained of a given token with respect to the target token.

What is "language modeling head" in BertForMaskedLM

Answers (2)

Related Questions

What is &quot;language modeling head&quot; in BertForMaskedLM

Answers (2)

Related Questions

What is "language modeling head" in BertForMaskedLM