Sergey Luchko
Sergey Luchko

Reputation: 3336

Is Python multiprocessing Pool thread safe?

I have Django project. If I make package variable that contains Pool() object, and will try to use that Pool from Django views (which run in parallel way), will be this way thread safe? Are there any others ways to do it?

from multiprocessing import Pool
general_executor_pool = Pool()

Upvotes: 5

Views: 2579

Answers (2)

Goulou
Goulou

Reputation: 738

For the record, I had to check this and it seems that multiprocessing.pool.Pool is indeed thread-safe. The following code does not trigger an AssertionError (tested with Python 3.6.9) :

import random
import time
import multiprocessing.pool
from threading import Thread

pool = multiprocessing.pool.Pool()

def return_value(value):
    time.sleep(random.random())
    return value
count = 100
def call_return_value():
    counter_start = random.randint(0, 100)
    result = list(range(counter_start, counter_start + count))
    pool_result = pool.imap_unordered(return_value, range(counter_start, counter_start + count), chunksize=1)
    pool_result = list(pool_result)
    assert set(pool_result) == set(result)
tl = [Thread(target=call_return_value) for _ in range(24)]
for t in tl:
    t.start()

Basically, this code starts a Process Pool, and lauches 24 threads calling the return_value function via this pool. This functions returns the value after waiting for a random delay (between 0 and 1s).

Of course, pool_result is not ordered anymore, but it contains the correct set of elements, and this is true for all threads : values do not get mixed.

Upvotes: 0

user3681414
user3681414

Reputation: 109

I found this question via Google as I'm asking the same question. Anecdotally I can say NO it is not, because I recently debugged a piece of software that suffered from race conditions. Here's how it went:

  1. A master process ran in a loop and spawned a multiprocessing pool in a new thread eery 3 minutes with a list of ~1000 accounts to be acted on
  2. The thread called multiprocessing.Pool(max_proccesses=32), pool.map(func, accounts). This would open 32 processes, and one by one apply each account to an available process.
  3. Unbeknownst to the original author, this process took far longer to complete than 3 minutes. So what happened the next time a thread was spawned to create a multiprocessing pool? Did it spawn 32 new processes for a total of 64? No, in practice it did not. Instead my results were scrambled and showed indication that multiple threads were acting on my data in a non-deterministic way.

I'd love to trace through the multiprocessing module to see if it is un-thread-safe by design, or get an answer from someone in the know. Anecdotally, at least, I have witnessed first hand that it is not.

Upvotes: 7

Related Questions