Reputation: 267
I am working on a data mining project where I want to grab info from multiple sites simultaneously. I am currently doing this by running the same mining script in 20 different terminal windows (on OS X).
My belief (which may be incorrect) is running the script in separate terminal windows is why the mining is executed in parallel.
Questions:
A) If I am incorrect about using multiple terminal windows, what would be the best approach?
B) If I am right to use multiple terminal windows, is there an efficient way to have the script execute in 20 different terminal windows?
I set up a prototype using 2 scripts.
Script 1 is trigger.py and is intended to send a list of arguments to a second script. In the trigger script below I am using numbers but the idea would be to send urls.
Script 2 is the execute.py and is intended to receive the argument and execute, ideally in a new terminal windows per argument. In practice, if this approach is the best way, then I put the miner in this script and have it recieve the url, open a new terminal window, and run.
Right now it simply executes in the same window. This is, again, the problem I am seeking help with.
Script 1 trigger.py
#!/usr/bin/python
import os
import sys
class newTerm(object):
def __init__(self, number):
self.number = number
def run(self):
os.system('/Users/InNov8/Desktop/execute.py ' + str(self.number))
starts = [100, 500, 1000, 2000]
for s in starts:
new = newTerm(s)
new.run()
Script 2 execute.py
#!/usr/bin/python
import sys
print 'Number of arguments:', len(sys.argv), 'arguments.'
print 'Argument List:', str(sys.argv)
number = int(sys.argv[1])
print number, number + 400
Upvotes: 1
Views: 1954
Reputation: 12879
To do parallel execution in multiple processes, look at the multiprocessing module.
The code below is a simple example that launches one process for each url in an array. In practice (if the number of urls is arbitrarily large), you would probably want to use a Pool instead, so that you could queue the urls to a fixed number of processes.
from multiprocessing import Process
def worker_process(url):
# process url...
print 'processing %s' % url
def main():
urls = ['http://www1.example.com/', 'http://www2.example.com/']
workers = []
for i in range(0, len(urls)):
p = Process(target=worker_process, args=(urls[i],))
p.start()
workers.append(p)
for worker in workers:
worker.join()
if __name__ == '__main__':
main()
Upvotes: 0
Reputation: 30288
It is not the separate terminal sessions but the separate processes/threads that allows things run in parallel. You can run them in the same shell in background as per @asdf.
You can even run them in the same process if you look at the threading
module.
However, if they produce output (e.g. diagnostic/progress messages) they will output over the top of each other. In that case you can use screen
to launch a number of process in a virtual terminal session but have independent input and output:
os.system('screen -dm scrape /Users/InNov8/Desktop/execute.py ' + str(self.number))
The -dm
means launch in a detached state. Then you can attach to this screen from any terminal window with:
$ screen -r scrape
You can move between the various running processes with <Crtl-a>n
and <Ctrl-a>p
and detach with <Ctrl-a>d
Upvotes: 0
Reputation: 3067
An easy way to do this would be to run the scripts in the background, which is actually pretty simple. Just append an &
to the end of your call (sending the command to the background) and you can run them all in the same terminal:
python trigger.py [params] &
You could even compile a bash script to start all of them simultaneously with one command. You could also use this to aggregate return values into one place for ease of use:
miner.sh
#!/bin/bash
python trigger.py [params1] &
python trigger.py [params2] &
#etc
Upvotes: 1