Reputation: 434
Say im having scrapper_1.py , scrapper_2.py, scrapper_3.py.
The way i run it now its from pycharm run/execute each in separate, this way i can see the 3 python.exe in execution at task manager.
Now im trying to write a master script say scrapper_runner.py that imports this scrappers as modules and run them all in parallel not sequential.
I tried examples with subprocess, multiprocessing even os.system from various SO posts ... but without any luck ... from logs they all run in sequence and from task manager i only see one python.exe execution.
Is this the right pattern for this kind of process ?
EDIT:1 (trying with concurrent.futures ProcessPoolExecutor) it runns sequentially.
from concurrent.futures import ProcessPoolExecutor
import scrapers.scraper_1 as scraper_1
import scrapers.scraper_2 as scraper_2
import scrapers.scraper_3 as scraper_3
## Calling method runner on each scrapper_x to kick off processes
runners_list = [scraper_1.runner(), scraper_1.runner(), scraper_3.runner()]
if __name__ == "__main__":
with ProcessPoolExecutor(max_workers=10) as executor:
for runner in runners_list:
future = executor.submit(runner)
print(future.result())
Upvotes: 0
Views: 226
Reputation: 313
A subprocess in python may or may not show up as a separate process, depending on your OS and your task manager. htop
in linux, for example, will display subprocesses under the parent process in tree-view.
I recommend taking a look at this in depth tutorial on the multiprocessing
module in python: https://pymotw.com/2/multiprocessing/basics.html
However, if python's built-in methods of multiprocessing/threading don't work or make sense to you, you can achieve your desired result by using bash to call your python scripts. The following bash script results in the attached screenshot.
#!/bin/sh
./py1.py &
./py2.py &
./py3.py &
Explanation: The &
at the end of each call tells bash to run each call as a background process.
Upvotes: 2
Reputation: 6891
Your problem is in how you setup the processes. You are not running the processes in parallel, even though you think you are. You are actually running them, when you add them to the runners_list
and then you are running the result of each runner in parallel as multiprocesses.
What you want to do, is to add the functions to the runners_list
without executing them, then have them being executed in your multiprocessing pool
. The way to achieve this, is to add the function references, i.e. the name of the functions. To do this, you should not include the parantheses, since this is the syntax for calling functions and not just name them.
In addition, to have the futures execute asynchronously, it is not possible to have a direct call to future.result
, as that will force the code to execute sequentially, to ensure that the results are available in the same sequnece as the functions are called.
This means that the soultion to your problem is
from concurrent.futures import ProcessPoolExecutor
import scrapers.scraper_1 as scraper_1
import scrapers.scraper_2 as scraper_2
import scrapers.scraper_3 as scraper_3
## NOT calling method runner on each scrapper_x to kick off processes
## Instead add them to the list of functions to be run in the pool
runners_list = [scraper_1.runner, scraper_1.runner, scraper_3.runner]
# Adding callback function to call when future is done.
# If result is not printed in callback, the future.result call will
# serialize the call sequence to ensure results in order
def print_result(future):
print(future.result)
if __name__ == "__main__":
with ProcessPoolExecutor(max_workers=10) as executor:
for runner in runners_list:
future = executor.submit(runner)
future.add_done_callback(print_result)
As you can see, here the invocation of the runners does not happen when the list is created, but later, when the runner
is submitted to the executor. And, when the results are ready, the callback is called, to print the result to screen.
Upvotes: 0