Reputation: 73
Is there any pretrained word2vec model with data containing both single word or multiple words coalesced together such as 'drama', 'drama_film' or '‘africanamericancommunity’. Is there any such model trained with huge dataset such as dataset trained for gloVE?
Upvotes: 1
Views: 568
Reputation: 836
I did a quick search on google, but unfortunately I could not find a pretrained model. One way to train your own model to detect phrases is to use a bigram model. So, you can take a big wikipedia dump, for instance, preprocess is using bigrams and train the word2vec model. A good github project which can help you to achieve this is https://github.com/KeepFloyding/wikiNLPpy A nice article on the topic: https://towardsdatascience.com/word2vec-for-phrases-learning-embeddings-for-more-than-one-word-727b6cf723cf
As stated in google pre-trained word2vec, the pre-trained model by google already contains some phrases (bigrams).
Upvotes: 1