Reputation: 6891
I'm new to subprocess module of python, currently my implementation is not multi processed.
import subprocess,shlex
def forcedParsing(fname):
cmd = 'strings "%s"' % (fname)
#print cmd
args= shlex.split(cmd)
try:
sp = subprocess.Popen( args, shell = False, stdout = subprocess.PIPE, stderr = subprocess.PIPE )
out, err = sp.communicate()
except OSError:
print "Error no %s Message %s" % (OSError.errno,OSError.message)
pass
if sp.returncode== 0:
#print "Processed %s" %fname
return out
res=[]
for f in file_list: res.append(forcedParsing(f))
my questions:
Is sp.communicate a good way to go? should I use poll?
if I use poll I need a sperate process which monitors if process finished right?
should I fork at the for
loop?
Upvotes: 1
Views: 2095
Reputation: 67860
1) subprocess.communicate() seems the right option for what you are trying to do. And you don't need to poll the proces, communicate() returns only when it's finished.
2) you mean forking to paralellize work? take a look at multiprocessing (python >= 2.6). Running parallel processes using subprocess is of course possible but it's quite a work, you cannot just call communicate(), which is blocking.
About your code:
cmd = 'strings "%s"' % (fname)
args= shlex.split(cmd)
Why not simply?
args = ["strings", fname]
As for this ugly pattern:
res=[]
for f in file_list: res.append(forcedParsing(f))
You should use list-comprehensions whenever possible:
res = [forcedParsing(f) for f in file_list]
Upvotes: 3
Reputation: 6393
About question 2: forking at the for loop will mostly speed things up if the script's supposed to run on a system with multiple cores/processors. It will consume more memory, though, and will stress IO harder. There will be a sweet spot somewhere that depends on the number of files in file_list
, but only benchmarking on a realistic target system can tell you where it is. If you find that number, you could add an if len(file_list) > <your number>:
with optional fork()
'ing [Edit: rather, as @tokland say's via multiprocessing
if it's available on your Python version (2.6+)] that chooses the most efficient strategy on a per-job basis.
Read about Python profiling here: http://docs.python.org/library/profile.html
If you're on Linux, you can also run time
: http://linuxmanpages.com/man1/time.1.php
Upvotes: 2
Reputation: 838216
There are several warnings in the subprocess documentation that advise you to use communicate to avoid problems with a processes blocking, so it would be a good idea to use that.
Upvotes: 1