Kumar S
Kumar S

Reputation: 11

Python multiprocessing exit condition error intermittently

Multiprocess job is running the tasks, I want to stop the rest of the parallel or dependent tasks if one of them fails or completes all the tasks. The problem is with 1st print, where it should check if job failed with non-zero exit code and already not completed then enter the loop and stop the rest of the jobs by breaking the while loop. however, even the execution completed successfully with exit 0, it enters the loop intermittently, stops the rest of the jobs by breaking the loop. What is going wrong here. Failed one enter image description here Passed one enter image description here Main job triggering multiprocess tasks.

def run_block(index):
    print index

    # do some execution

def run_blocks(target, dict_blocks):
    process = []
    for (index, (block_id, depend_on)) in \
        enumerate(dict_blocks.items()):
        proc = multiprocessing.Process(target=run_block, args=index)
        process.append(proc)
        proc.start()
    check_exit(process)


def check_exit(process):
    done = False
    process_count = len(process)
    count = 0
    completed = []
    while not done:
        for proc in process:
            if proc.exitcode != 0 and proc.exitcode != None:
                print ('1st', proc, count, done, proc.exitcode)
                done = True
                break
            if proc.exitcode == 0 and proc.pid not in completed:
                print ('2nd', proc, count, done, proc.exitcode)
                completed.append(proc.pid)
                count += 1
            if count == process_count:
                print ('3rd', proc, count, done)
                done = True
                break
    stop_process_exit(process, count, process_count, done)


def stop_process_exit(
    process,
    count,
    process_count,
    done,
    ):
    print (process_count, count, done, process)
    for proc in process:
        if proc.is_alive():
            proc.terminate()
    if done == True and count != process_count:
        exit(1)

Upvotes: 1

Views: 500

Answers (1)

Paul Cornelius
Paul Cornelius

Reputation: 10999

Your processes are running independently, so the variable proc.exitcode must be dynamic. In other words, it might change at any moment because the process has just finished. In this statement:

if proc.exitcode != 0 and proc.exitcode != None

you access the variable twice. Suppose proc.exitcode is None when you begin to execute this line. Python does the first comparison and it evaluates True. Now suppose that the process finishes at that exact moment, and now proc.exitcode becomes zero. Python performs the second comparison, and now that is also True! So your print statement fires, and then you break out of the loop when you really don't want to.

Of course I don't know this is what's happening since I can't run your program, but the evidence points that way.

I would change the loop like this:

for proc in process:
    if proc.is_alive():
        continue
    if proc.exitcode != 0:
        print ('1st', proc, count, done, proc.exitcode)
        done = True
        break
    # ... everything else is not changed

Upvotes: 0

Related Questions