Shadekur Rahman
Shadekur Rahman

Reputation: 73

Is there any pretrained word2vec model capable of detecting phrase

Is there any pretrained word2vec model with data containing both single word or multiple words coalesced together such as 'drama', 'drama_film' or '‘africanamericancommunity’. Is there any such model trained with huge dataset such as dataset trained for gloVE?

Upvotes: 1

Views: 568

Answers (1)

teoML
teoML

Reputation: 836

I did a quick search on google, but unfortunately I could not find a pretrained model. One way to train your own model to detect phrases is to use a bigram model. So, you can take a big wikipedia dump, for instance, preprocess is using bigrams and train the word2vec model. A good github project which can help you to achieve this is https://github.com/KeepFloyding/wikiNLPpy A nice article on the topic: https://towardsdatascience.com/word2vec-for-phrases-learning-embeddings-for-more-than-one-word-727b6cf723cf

As stated in google pre-trained word2vec, the pre-trained model by google already contains some phrases (bigrams).

Upvotes: 1

Related Questions