Reputation: 5730
I used Gensim
's Word2Vec
for training most similar words.
My dataset is all posts from my college community site.
Each dataset consists of like this:
(title) + (contents) + (all comments) // String
For example,
data[0] => "This is title. Contents is funny. What so funny?. Not funny for me"
So, I have around 400,000 datas like above and make them as a vector and try to train these data via Word2Vec
.
I wonder that whether it is possible to make Word2Vec
consider WEIGHT, which means, if I give an weight to certain data vector, Word2Vec
train this data in a way that each word in this data vector has more strong relationship(similarity).
For example, If I gave a weight 5 to dataset, I like Pizza, Chicken
, the word Pizza
and Chicken
(or like
and Pizza
etc) has strong relations than other data vector's words.
Would that be possible?
Sorry for poor explanation but I'm not native english speaker. If need more detailed info, please post comment.
Upvotes: 2
Views: 1400
Reputation: 54173
There's no such configurable weighting in the definition of the word2vec algorithm, or the gensim implementation.
You could try repeating those text examples that you want to have more influence. (Ideally, such repetitions wouldn't be back-to-back, but shuffled among the entire dataset.)
As a result, those examples will affect the underlying model's training more often, for a greater proportion of the total training time – shifting the relative positioning of the involved words, compared to less-repeated examples. That might have the end result you're seeking.
Upvotes: 2