Word2Vec: Is it possible to train with respect to weight in NLP?

Question

I used Gensim's Word2Vec for training most similar words.

My dataset is all posts from my college community site.

Each dataset consists of like this:

(title) + (contents) + (all comments)  // String

For example,

data[0] => "This is title. Contents is funny. What so funny?. Not funny for me"

So, I have around 400,000 datas like above and make them as a vector and try to train these data via Word2Vec.

I wonder that whether it is possible to make Word2Vec consider WEIGHT, which means, if I give an weight to certain data vector, Word2Vec train this data in a way that each word in this data vector has more strong relationship(similarity).

For example, If I gave a weight 5 to dataset, I like Pizza, Chicken, the word Pizza and Chicken (or like and Pizza etc) has strong relations than other data vector's words.

Would that be possible?

Sorry for poor explanation but I'm not native english speaker. If need more detailed info, please post comment.

gojomo · Accepted Answer

There's no such configurable weighting in the definition of the word2vec algorithm, or the gensim implementation.

You could try repeating those text examples that you want to have more influence. (Ideally, such repetitions wouldn't be back-to-back, but shuffled among the entire dataset.)

As a result, those examples will affect the underlying model's training more often, for a greater proportion of the total training time – shifting the relative positioning of the involved words, compared to less-repeated examples. That might have the end result you're seeking.

Word2Vec: Is it possible to train with respect to weight in NLP?

Answers (1)

Related Questions