Reputation: 401
What I want to achieve
Generator -> Popen(...) -> Generator
without holding too much data in memory.Here a working, minimal example which demonstrates what I want to achieve:
from io import StringIO
from subprocess import Popen, PIPE
import time
proc_input = StringIO("aa\nbb\ncc\ndd")
proc = Popen(["cat"], stdin=PIPE, stdout=PIPE)
for line in proc_input:
proc.stdin.write(line.encode())
yield proc.stdout.readline()
time.sleep(1)
Problem: The proc.stdout.readline()
just blocks and doesn't show anything.
What I already learned:
fileno()
implemented), I can pass this directly to stdin and avoid writing to the PIPE. But for doing so, I need first to stream the generator to a file, which I like to avoid as this seems to be a unnecessary detour. For example the following works.
import tempfile
from subprocess import Popen, PIPE
tp = tempfile.TemporaryFile()
tp.write("aa\nbb\ncc\ndd".encode())
tp.seek(0)
proc = Popen(["cat"], stdin=tp, stdout=PIPE)
for line in proc.stdout:
print(line)
proc_input = StringIO("aa\nbb\ncc\ndd")
proc = Popen(["cat"], stdin=PIPE, stdout=PIPE)
for line in proc_input:
proc.stdin.write(line.encode())
proc.stdin.close()
for line in proc.stdout:
print(line)
What I also tried:
Popen(..., bufsize=)
, but it seemed not to have any effect.io.BufferedWriter
with the hope, that Popen can digest this as an input for stdin. Also without success.Additional info: I'm using Linux.
Remarks to Comments
It was suggested to break the input generator into chunks. This can be achieved via
def PopenStreaming(process, popen_kwargs, nlines, input):
while input:
proc = Popen(process, stdin=PIPE, stdout=PIPE, **popen_kwargs)
for n, row in enumerate(input):
proc.stdin.write(row)
if n == nlines:
proc.stdin.close()
break
for row in proc.stdout:
yield row
Upvotes: 1
Views: 1838
Reputation: 11224
I'm not sure if it's always possible to do what you're trying to do. The docs at https://docs.python.org/3/library/subprocess.html say
Warning: Use
communicate()
rather than.stdin.write
,.stdout.read
or.stderr.read
to avoid deadlocks due to any of the other OS pipe buffers filling up and blocking the child process.
So you're supposed to use communicate
, but that means waiting for the process to terminate:
Popen.communicate(input=None, timeout=None)
Interact with process: Send data to stdin. Read data from stdout and stderr, until end-of-file is reached. Wait for process to terminate.
That means you would be able to use communicate
only once, which is not what you want.
However, I think using a line-buffered text mode should be safe to avoid a dead-lock:
from subprocess import Popen, PIPE
kwargs = {
"stdin": PIPE,
"stdout": PIPE,
"universal_newlines": True, # text mode
"bufsize": 1, # line buffered
}
with Popen(["cat"], **kwargs) as process:
for data in ["A\n", "B\n", "C\n"]:
process.stdin.write(data)
print("data sent:", data)
output = process.stdout.readline()
print("output received:", output)
If that isn't applicable in your case, maybe you can split your call into multiple smaller calls? Using check_output
with its input
keyword argument might also simplify your code:
from subprocess import check_output
output = check_output(["cat"], input=b"something\n")
print(output)
Upvotes: 1