Terminating all processes in Multiprocessing Pool

Question

I have a script that is essentially an API scraper, it runs perpetually. I strapped a map_async pool to it and its glorious, the pool was hiding some errors which I learned was pretty common. So I incorporated this wrapped helper function.

helper.py

def trace_unhandled_exceptions(func):
    @functools.wraps(func)
    def wrapped_func(*args, **kwargs):
        try:
            return func(*args, **kwargs)
        except:
            print('Exception in '+func.__name__)
            traceback.print_exc()
    return wrapped_func

My main script looks like

scraper.py

import multiprocessing as mp
from helper import trace_unhandled_exceptions

start_block = 100
end_block = 50000

@trace_unhandled_exceptions
def main(block_num):
    block = blah_blah(block_num)
    return block

if __name__ == "__main__":
    cpus = min(8, mp.cpu_count()-1 or 1)

    pool = mp.Pool(cpus)
    pool.map_async(main, range(start_block - 20, end_block), chunksize=cpus)
    pool.close()
    pool.join()

This works great, im receiving exception:

Exception in main
Traceback (most recent call last):
.....

How can I get the script to end on exception, ive tried incorporating os.exit or sys.exit into the helper function like this

def trace_unhandled_exceptions(func):
    @functools.wraps(func)
    def wrapped_func(*args, **kwargs):
        try:
            return func(*args, **kwargs)
        except:
            print('Exception in '+func.__name__)
            traceback.print_exc()
            os._exit(1)
    return wrapped_func

But I believe its only terminating the child process and not the entire script, any advice?

martineau · Accepted Answer

I don't think you need that trace_unhandled_exception decorator to do what you want, at least not if you use pool.apply_async() instead of pool.map_async() because the you can use the error_callback= option it supports to be notified whenever the target function fails. Note that map_async() also supports something similar, but it's not called until the entire iterable has been consumed — so it would not be suitable for what you're wanting to do.

I got the idea for this approach from @Tim Peters' answer to a similar question titled Multiprocessing Pool - how to cancel all running processes if one returns the desired result?

import multiprocessing as mp
import random
import time


START_BLOCK = 100
END_BLOCK = 1000

def blah_blah(block_num):
    if block_num % 10 == 0:
        print(f'Processing block {block_num}')
    time.sleep(random.uniform(.01, .1))
    return block_num

def main(block_num):
    if random.randint(0, 100) == 42:
        print(f'Raising radom exception')
        raise RuntimeError('RANDOM TEST EXCEPTION')
    block = blah_blah(block_num)
    return block

def error_handler(exception):
    print(f'{exception} occurred, terminating pool.')
    pool.terminate()

if __name__ == "__main__":
    processes = min(8, mp.cpu_count()-1 or 1)
    pool = mp.Pool(processes)
    for i in range(START_BLOCK-20, END_BLOCK):
        pool.apply_async(main, (i,), error_callback=error_handler)
    pool.close()
    pool.join()
    print('-fini-')

Terminating all processes in Multiprocessing Pool

Answers (2)

Related Questions