OneAndOnly
OneAndOnly

Reputation: 1056

Force gensim's word2vec vectors to be positive?

Is there any way in gensim that i can force the learned vectors in word2vec to be all positive? (all the elements of vector be positive). i am working on a different task that needs these vectors to be positive ( the reason is really complicated so don't ask why )

so what is the easiest way for me to force gensim to learn positive vectors?

Upvotes: 0

Views: 564

Answers (1)

gojomo
gojomo

Reputation: 54173

There is no built-in feature of Gensim that would allow this extra constraint/regularization to be applied during training.

You should probably try to explain your 'really complicated' reason for this idosyncratic request. There might be a better way to achieve the real end-goal, rather than shoehorning vectors that are typically bushy-and-balanced around the origin into a non-negative representation.

Notably, a paper called 'All-but-the-Top: Simple and Effective Postprocessing for Word Representations' has suggested word-vectors can be improved by postprocessing to ensure they are more balanced around the origin, rather than less (as seems a reliable side-effect of typical negative-sampling configurations).

If you're still interested to experiment in the opposite direction – transforming usual word2vec word-vectors into a representation where all dimensions are positive – I can think of a number of trivial, superficial ways to achieve that. I have no idea whether they'd actually preserve, or ruin, beneficial properties in the vectors – but you could try them, and see. For example:

  • You could try simply setting all negative dimensions to 0.0 - truncation. (Loses lots of info but might give a quick indication if a dirt-simple experiment gives you any of the benefits you seek.)
  • You could find the largest negative dimension that appears anywhere in any of the vectors, then add its absolute value to all other dimensions. Voila! No vector dimension is now lower than 0.0. (You could also try this in a per-dimension manner - only correct dimension #0 with the lowest dimension #0 value. Or, try other re-scalings of each dimension such that the previously-highly-negative values are 0.0, and the previous-highly-positive values stay where they are or only shift a little.)
  • You could try turning every dimension in the original word-vectors into two dimensions in a transformed set: one that's the original positive value, or 0.0 if it was negative, and a 2nd dimension that's the absolute value of the original negative value, or 0.0 if it was positive. (Or similarly: one dimension that's the absolute-value of the original value, and one dimension that's 0.0 or 1.0 depending on whether original value was negative or positive.)

There are probably other more-sophisticated factorization/decompositions for re-representing the full set of word-vectors in a transformed array with only non-negative individual values, but I don't know them offhand, other than to think it might be worth searching for them.

And, whether any of these transformations work for your next steps, who knows? But it might be worth trying. (And if any of these offer surprisingly good results, it'd be great to hear in a followup comment!)

Upvotes: 3

Related Questions