Reputation: 139
I have two different tasks that I want to split amongst processes. I have one task that consists of retrieving responses from URLs and writing the responses to a Queue
(multiprocessing Queue, not threading) which I would like to have a few processes working on in parallel.
I have another task that waits for the Queue
to be have response data, retrieves responses from it, and writes to a file. I want to have one process work on this.
The problem is if I start a pool of processes working on hitting the URLs, the writing processes won't start until all of the processes are done. How do I start the pool of processes to hit the URLs and start the process to write to a file at the same time/one after the other?
My code:
CSV = CHANNEL + ".csv"
response_queue = Queue()
urls = []
for row in read_channel_data(CSV):
url = "https://some_domain/%s" % row[1]
urls.append(url)
# This process will start and wait for response_queue to fill up inside func
write_process = Process(target=func, args=(response_queue,))
write_process.start()
write_process.join()
# This never starts
pool = Pool(processes=PROCESSES)
pool.map_async(get_data, urls)
pool.close()
pool.join()
Upvotes: 0
Views: 159
Reputation: 94881
Just move the call to write_process.join()
until after the call to pool.join()
. The join
call is blocking until func
exits, which won't happen unless the pool
stuff runs. So just call start
, and hold off on calling join
until you've been able to run the pool
code.
Upvotes: 3