Vishal Suryavanshi
Vishal Suryavanshi

Reputation: 377

what is workers parameter in word2vec in NLP

in below code . i didn't understand the meaning of workers parameter . model = Word2Vec(sentences, size=300000, window=2, min_count=5, workers=4)

Upvotes: 1

Views: 7610

Answers (3)

Hakan Özler
Hakan Özler

Reputation: 988

You can use effective_n_jobs to determine the correct use of the number of threads in your case.

from gensim.utils import effective_n_jobs

effective_n_jobs(1)
effective_n_jobs(-1)
effective_n_jobs(None)
effective_n_jobs(12)
effective_n_jobs(10)

# outputs
1
12
1
12
10

Upvotes: 0

gojomo
gojomo

Reputation: 54173

As others have mentioned, workers controls the number of independent threads doing simultaneous training.

In general, you'll never want to use more workers than the number of CPU cores.

But further, the gensim Word2Vec implementation faces a bit more thread-to-thread bottlenecking due to issues like the Python "Global Interpreter Lock" ('GIL') and some of its IO/corpus-handling design decisions.

So on systems with a large number of cores, such as more than 16, the optimal workers value for maximum throughput is usually less than the full count of cores – often in the 3-12 range. (The exact number will depend on other aspects of your corpus-handling and chosen metaparameters, and for now is most often discovered through trial-and-error.)

If your corpus is already in a specific text format, the latest gensim release, 3.6.0, offers a new input mode that allows better scaling of workers all the way up to the count of CPU cores. See this section of the release notes about the new corpus_file parameter for details.

Upvotes: 1

SUBHOJEET
SUBHOJEET

Reputation: 408

workers = use this many worker threads to train the model (=faster training with multicore machines).

If your system is having 2 cores, and if you specify workers=2, then data will be trained in two parallel ways.

By default , worker = 1 i.e, no parallelization

Upvotes: 4

Related Questions