user8566323
user8566323

Reputation:

N-grams in GloVe

I want to construct word embeddings for documents using GloVe. I know how to obtain vector embeddings for single words (unigrams) as follows (for their example text document).

$ git clone http://github.com/stanfordnlp/glove
$ cd glove && make
$ ./demo.sh

Now, I want to obtain vector embeddings for bigrams. For example;

  1. "New york" -> instead of "New", and "york"
  2. "machine learning" -> instead of "machine", and "learning"

Is it possible to do in GloVe? If yes, how?

Upvotes: 1

Views: 2618

Answers (1)

perfall
perfall

Reputation: 118

I don't think they have available bigram vectors, but you could produce them yourself by preprocessing a corpus. For example if a document in your corpus looks like this:

GloVe is love

You can format it like this:

START_GloVe GloVe_is is_love love_END

And train a set of embeddings on this corpus as usual. You could also have a look at Word2vec, like in this post which is similar.

Upvotes: 1

Related Questions