Multiprocessing separation of tasks

Question

I have two different tasks that I want to split amongst processes. I have one task that consists of retrieving responses from URLs and writing the responses to a Queue (multiprocessing Queue, not threading) which I would like to have a few processes working on in parallel.

I have another task that waits for the Queue to be have response data, retrieves responses from it, and writes to a file. I want to have one process work on this.

The problem is if I start a pool of processes working on hitting the URLs, the writing processes won't start until all of the processes are done. How do I start the pool of processes to hit the URLs and start the process to write to a file at the same time/one after the other?

My code:

CSV = CHANNEL + ".csv"
    response_queue = Queue()

    urls = []
    for row in read_channel_data(CSV):
        url = "https://some_domain/%s" % row[1]
        urls.append(url)

    # This process will start and wait for response_queue to fill up inside func
    write_process = Process(target=func, args=(response_queue,))
    write_process.start()
    write_process.join()

    # This never starts
    pool = Pool(processes=PROCESSES)
    pool.map_async(get_data, urls)
    pool.close()
    pool.join()

dano · Accepted Answer

Just move the call to write_process.join() until after the call to pool.join(). The join call is blocking until func exits, which won't happen unless the pool stuff runs. So just call start, and hold off on calling join until you've been able to run the pool code.

Multiprocessing separation of tasks

Answers (1)

Related Questions