krcoder
krcoder

Reputation: 199

Default estimation method of Gensim's Word2vec Skip-gram?

I am now trying to use word2vec by estimating skipgram embeddings via NCE (noise contrastive estimation) rather than conventional negative sampling method, as a recent paper did (https://asistdl.onlinelibrary.wiley.com/doi/full/10.1002/asi.24421?casa_token=uCHp2XQZVV8AAAAA%3Ac7ETNVxnpqe7u9nhLzX7pIDjw5Fuq560ihU3K5tYVDcgQEOJGgXEakRudGwEQaomXnQPVRulw8gF9XeO). The paper has a replication GitHub repository (https://github.com/sandeepsoni/semantic-progressiveness), and it mainly relied on gensim for implementing word2vec, but the repository is not well organized and in a mess, so I have no clue about how the authors implemented NCE estimation via gensim's word2vec.

The authors just used gensim's word2vec as a default status without including any options, so my question is what is the default estimation method for gensim's word2vec under Skip-gram embeddings. NCE? According to your manual, it just says there is an option for negative sampling, and if set to 0, then no negative sampling is used. But then what estimation method is used? negative (int, optional) – If > 0, negative sampling will be used, the int for negative specifies how many “noise words” should be drawn (usually between 5-20). If set to 0, no negative sampling is used.

Thanks you in advance, and look forward to hearing from you soon!

Upvotes: 0

Views: 362

Answers (1)

gojomo
gojomo

Reputation: 54153

You can view the default parameters for the Gensim Word2Vec model, in an unmodified Gensim library, in the Gensim docs. Here's a link to the current version (4.1) docs for the Word2Vec constructor method, showing all default parameter values:

https://radimrehurek.com/gensim/models/word2vec.html#gensim.models.word2vec.Word2Vec

class gensim.models.word2vec.Word2Vec(sentences=None, corpus_file=None, vector_size=100, alpha=0.025, window=5, min_count=5, max_vocab_size=None, sample=0.001, seed=1, workers=3, min_alpha=0.0001, sg=0, hs=0, negative=5, ns_exponent=0.75, cbow_mean=1, hashfxn=, epochs=5, null_word=0, trim_rule=None, sorted_vocab=1, batch_words=10000, compute_loss=False, callbacks=(), comment=None, max_final_vocab=None, shrink_windows=True)

Two of those parameters – hs=0, negative=5 – mean the default mode has hierarchical-softmax disabled, and negative-sampling enabled with 5 negative words. These have been the default of Gensim's Word2Vec for many versions, so even other code is using an older version, this is likely the mode used (unless parameters or modified/overriden code changed them).

Upvotes: 2

Related Questions