Reputation:
I want to construct word embeddings for documents using GloVe. I know how to obtain vector embeddings for single words (unigrams) as follows (for their example text document).
$ git clone http://github.com/stanfordnlp/glove
$ cd glove && make
$ ./demo.sh
Now, I want to obtain vector embeddings for bigrams. For example;
Is it possible to do in GloVe? If yes, how?
Upvotes: 1
Views: 2618
Reputation: 118
I don't think they have available bigram vectors, but you could produce them yourself by preprocessing a corpus. For example if a document in your corpus looks like this:
GloVe is love
You can format it like this:
START_GloVe GloVe_is is_love love_END
And train a set of embeddings on this corpus as usual. You could also have a look at Word2vec, like in this post which is similar.
Upvotes: 1