Noah
Noah

Reputation: 11

Can I appoint a Masked Language Model's outputs' range?

When different kinds of models are trained with masked language modeling, the input embeddings at masked positions are replaced with a MASK token. I'm wondering if I could appoint the range of a MASK token? For example:

1) "What a [MASK] weather!" 
2) "What a [MASK] person he is!"
3) "How can you do such a [MASK] thing!" 

...

Instead of let a pretrained model using its ability to find a suitable word in its whole vocab, I want the pretrained model to pick a word which is from a specific token set, e.g {"good","great","stupid","bad"}, to replace the MASK token. In another words, when facing all different kinds of input, I wish the model could replace the MASK token using the word from the specific token set. Could anyone give me some hints to do this? Thanks!

Upvotes: 0

Views: 51

Answers (1)

stetinden
stetinden

Reputation: 26

The output layer in language models is a linear projection from the hidden dimension to a pre-determined vocabulary size, so in a sense, this is already what is happening if you use an out-of-the-box LM. If you want to make this smaller than the vocab typically used for a language, e.g ~30k for English, you have to create a subset of whatever tokenizer you use and limit this output layer to the size of this set, or train your own tokenizer with a much smaller number of possible sub-word tokens.

If you want to do this dynamically, so that the subset of possible tokens is different for each input sample, then you would need some way of knowing which tokens are relevant possibilities at each step, essentially leaking information to the model that it would otherwise need to learn through its training. You can achieve this by weighting the positions pertaining to non-relevant tokens in the output layer negatively, but the result will probably be a model that has much lower generalization power.

Upvotes: 0

Related Questions