jacob mathew
jacob mathew

Reputation: 153

pathos pool statement gets hanged

My program seems to not proceed further beyond the Pool(5) statement. I am using python 3.6 on windows server, 64 bit virtual machine with 8 virtual CPU's.

Code is as below

import pathos.multiprocessing as mp
 poolObj = mp.Pool(5)     
 docs = poolObj.map(nlp,textStr)

it gets hanged at the statement Pool(5). I tried with ProcessingPool(5) as well, same result.

Upvotes: 1

Views: 858

Answers (2)

Nicolas
Nicolas

Reputation: 363

Possible explanation

I had a similar issue, it took me ages to understand what happened, and I eventually discovered that one of the processes was killed by the OOM killer because it was using too much RAM.
Pathos could not detect this, and was waiting for the process to finish, although it had been killed (and a new idle one created instead).

On Ubuntu, you can check out the kernel messages to find out if the OOM killer has been triggered:

dmesg -T

Look for a line that mentions sth like:

[Mon Jan 10 02:24:40 2022] Out of memory: Killed process 1420 (python) total-vm:14764496kB, anon-rss:13565716kB, file-rss:28kB, shmem-rss:0kB, UID:1000 pgtables:27652kB oom_score_adj:0

To reproduce problem

If you want to try by yourself, you can reproduce the same behaviour with the following code snippet:

import time
from pathos.multiprocessing import ProcessPool

def do_something(i):
    print(i, 'entering')
    time.sleep(2)
    print(i, 'returning')
    return i

with ProcessPool(2) as pool:
    results = pool.map(
        do_something,
        range(5)
    )

During the execution, you can use htop to kill one of the subprocesses (last 2 lines of my screenshot). If you do this, you'll end up in a hanging state: no CPU use, but the Python script never returns.

htop multiprocessing

Upvotes: 0

Mike McKerns
Mike McKerns

Reputation: 35247

I'm the pathos author. First... it helps if you post a code snippet that can be executed by people attempting to answer your question. That helps you get a better answer, as it can be diagnosed better (as in this case, it might be a serialization issue, or it might be the freeze_support windows issue, or it might be a build issue).

Here's what I can suggest in abstract of knowing more details:

  • Do you have a C compiler? If not, then you aren't actually using multiprocess, which is what pathos intends to use. It's a fork of multiprocessing that has more capabilities. If the answer is no, then you need to install one, and then rebuild multiprocess.
  • You don't need to do run within __main__ if you are using multiprocess (see above), however, on windows, you will need to use pathos.helpers.freeze_support. It is required for pools on windows in most cases.
  • If both of the above are fine, then I'd check if your object serializes, and on windows you can confirm if the object will pickle correctly for multiprocess with dill.check (in the dill package).

It also might be a combination of one or more of the above.

Upvotes: 2

Related Questions