Falco Peregrinus
Falco Peregrinus

Reputation: 587

Python multiprocessing doesn't finish all tasks

I have a lot of files that need to be processed by some software. They don't need to be processed in the order.
Let's say I have 12 files and divided them in three lists then tried to send these lists to different processes to be executed:

# import all files
files = glob.glob(src_path + "*.fits")
files_list = [files[0::3], files[1::3], files[2::3]]

num_processors = 3  # Create a pool of processors
p = Pool(processes = num_processors)  # get them to work in parallel
output = pool.map(run2, [f for f in files_list])


def run2(files, *args):
    for ffit in files:
        terminal_astrometry(command)

def terminal_astrometry(command):
    result = subprocess.run(command, stdout=subprocess.PIPE)

The problem is that sometimes, the program doesn't process all of these files, i.e. 11 files do get processed but one does not. Or other time, 9 finished but 3 were skipped. Sometimes it does finish all tasks(process all of the files).

Essentially, in run2() function I am calling that particular software that I want to be run in parallel (Astrometry.net) on every file run2() function received.

EDIT2: I trimmed run2() function because it contains a lot of calculation(statistics) not relevant to a problem here(at least I think so) and posted it here.

Upvotes: 0

Views: 712

Answers (2)

D Hudson
D Hudson

Reputation: 1092

Your symptoms sound like a race condition, however pool.map blocks the main process until all tasks have finished so the code will not progress past that line until all tasks have finished. Therefore, I think the problem may be within the run2 function - could you post its code?

Edit: I previously had the following text in the answer too, the question has now been edited:

You are calling run2 twice for each file - once asynchronously with the pool, and once in the main process. Depending on the logic within this function, this could be the cause of the odd behaviour you're seeing.

Upvotes: 3

Falco Peregrinus
Falco Peregrinus

Reputation: 587

Software that I'm calling inside the run2() function is causing problems. It tries to write stdout in the same file which causes it to not complete all the tasks.

Upvotes: 0

Related Questions