Reputation: 96937
I have a couple subprocess
instances I'd like to string together into a pipeline, but I am stuck and would like to ask for advice.
For example, to mimic:
cat data | foo - | bar - > result
Or:
foo - < data | bar - > result
...I first tried the following, which hangs:
import subprocess, sys
firstProcess = subprocess.Popen(['foo', '-'], stdin=subprocess.PIPE,
stdout=subprocess.PIPE)
secondProcess = subprocess.Popen(['bar', '-'], stdin=firstProcess.stdout,
stdout=sys.stdout)
for line in sys.stdin:
firstProcess.stdin.write(line)
firstProcess.stdin.flush()
firstProcess.stdin.close()
firstProcess.wait()
My second attempt uses one subprocess
instance with the shell=True
parameter, which works:
import subprocess, sys
pipedProcess = subprocess.Popen(" ".join(['foo', '-', '|', 'bar', '-']),
stdin=subprocess.PIPE, shell=True)
for line in sys.stdin:
pipedProcess.stdin.write(line)
pipedProcess.stdin.flush()
pipedProcess.stdin.close()
pipedProcess.wait()
What am I doing wrong with the first, chained subprocess
approach? I read that it is best not to use shell=True
and I'm curious what I'm doing wrong with the first approach. Thanks for your advice.
EDIT
I fixed a typo in my question and fixed the stdin
parameter of secondProcess
. It still hangs.
I also tried removing firstProcess.wait()
which resolves the hang, but then I get a 0-byte file as result
.
I'll stick with the pipedProcess
, since it works fine. But if anyone knows why the first setup hangs or makes a 0-byte file as output, I'd be interested to know why as well.
Upvotes: 2
Views: 185
Reputation: 414139
To fix the first example, add foo_process.stdout.close()
as the docs suggest. The following code emulates foo - | bar -
command:
#!/usr/bin/python
from subprocess import Popen, PIPE
foo_process = Popen(['foo', '-'], stdout=PIPE)
bar_process = Popen(['bar', '-'], stdin=foo_process.stdout)
foo_process.stdout.close() # allow foo to know if bar ends
bar_process.communicate() # equivalent to bar_process.wait() in this case
You don't need to use sys.stdin
, sys.stdout
explicitly here unless their different from sys.__stdin__
, sys.__stdout__
.
To emulate foo - < data | bar - > result
command:
#!/usr/bin/python
from subprocess import Popen, PIPE
with open('data','rb') as input_file, open('result', 'wb') as output_file:
foo = Popen(['foo', '-'], stdin=input_file, stdout=PIPE)
bar = Popen(['bar', '-'], stdin=foo.stdout, stdout=output_file)
foo.stdout.close() # allow foo to know if bar ends
bar.wait()
If you want to feed modified input line-by-line to the foo
process i.e., to emulate python modify_input.py | foo - | bar -
command:
#!/usr/bin/python
import sys
from subprocess import Popen, PIPE
foo_process = Popen(['foo', '-'], stdin=PIPE, stdout=PIPE)
bar_process = Popen(['bar', '-'], stdin=foo_process.stdout)
foo_process.stdout.close() # allow foo to know if bar ends
for line in sys.stdin:
print >>foo_process.stdin, "PY", line, # modify input, feed it to `foo`
foo_process.stdin.close() # tell foo there is no more input
bar_process.wait()
Upvotes: 1
Reputation: 21269
shell=True
works because you're asking the shell to interpret your entire command line and handle the piping itself. It is effectively as if you typed foo - | bar -
directly into the shell.
(This is also why it can be unsafe to use shell=True
; there are many ways to fool the shell into doing bad things that won't happen if you directly pass the command and arguments in as a list that isn't subject to parsing by any intermediaries.)
Upvotes: 2