Why parallel processing taking longer than usual code?

Question

The idea is to update a particular pre-trained word2vec model with different sets of new corpus. I have the following

# c1, c2 are each a list of 100 files
filelist = [c1, c2, c3, c4, c5, c6, c7, c8, c9, c10]

def update_model(files):
    # loading a pre-trained model
    trained_model = gensim.models.Word2Vec.load("model_both_100")
    # Document feeder is an iterable
    docs = DocumentFeeder(files)
    trained_model.build_vocab(docs, update=True)
    trained_model.train(docs, total_examples=trained_model.corpus_count, epochs=trained_model.epochs)

with Pool(processes=10) as P:
    P.map(update_model, filelist)

it takes about ~13 minutes to run. But the non-parallel version (looping over filelist) takes ~11 min. Why is this happening? Running on a 12 core cpu.

Why parallel processing taking longer than usual code?

Answers (1)

Related Questions