Reputation: 15
I already heard that FastText is generating OOV word vectors using its n-gram's. It is already automatically built-in at FastText architecture or we should like to tune specific parameters to it? like an oov_tokens in Keras tokenizer. I already looking for what parameters to tune in Fast Text but I couldn't find any.
If anyone knows and wants to share their knowledge I would be very appreciative of that.
Thank you.
Upvotes: 0
Views: 662
Reputation: 3536
Vector generation for OOV words is integrated into fastText (at least in the original implementation by Facebook).
To generate these vectors, fastText uses subword n-grams. To learn more, you can read this thread and this visual guide.
For this reason, the parameters that most influence the creation of vectors for OOV words are the following:
minn
(min length of char ngram)maxn
(max length of char ngram)For more information about fastText options/parameters, see the official documentation.
Upvotes: 1