Imrul Huda
Imrul Huda

Reputation: 191

Why parallel processing taking longer than usual code?

The idea is to update a particular pre-trained word2vec model with different sets of new corpus. I have the following

# c1, c2 are each a list of 100 files
filelist = [c1, c2, c3, c4, c5, c6, c7, c8, c9, c10]

def update_model(files):
    # loading a pre-trained model
    trained_model = gensim.models.Word2Vec.load("model_both_100")
    # Document feeder is an iterable
    docs = DocumentFeeder(files)
    trained_model.build_vocab(docs, update=True)
    trained_model.train(docs, total_examples=trained_model.corpus_count, epochs=trained_model.epochs)

with Pool(processes=10) as P:
    P.map(update_model, filelist)

it takes about ~13 minutes to run. But the non-parallel version (looping over filelist) takes ~11 min. Why is this happening? Running on a 12 core cpu.

Upvotes: 0

Views: 72

Answers (1)

gojomo
gojomo

Reputation: 54173

Gensim's Word2Vec training already uses multiple threads – depending on the workers parameter at model creation. (The default is to use workers=3, but your model may have been initialized to use even more.)

So you are launching 10 (heavyweight) processes, each separately loading a full-size model. That could easily trigger heavy memory usage & thus virtual-memory swapping.

Then each of those models does its own (single-threaded) vocabulary-expansion, then its (one manager thread and 3 or more worker threads) training. If they're all in training simultaneously, than means 40 threads active, within 10 OS processes, on your 12 core processor. There's no reason to necessarily expect a speedup in such a situation, and the contention of more-threads-than-cores, and all contending for access to totally different loaded model memory ranges, could easily explain a slowdown.

Are you really trying to create 10 separate incrementally-updated models? (Do they get re-saved to 10 different filenames after the update-training?)

Upvotes: 1

Related Questions