physicus
physicus

Reputation: 401

Stream in-memory data over Python Subprocess' Popen over external command

What I want to achieve

Here a working, minimal example which demonstrates what I want to achieve:


    from io import StringIO
    from subprocess import Popen, PIPE
    import time

    proc_input = StringIO("aa\nbb\ncc\ndd")
    proc = Popen(["cat"], stdin=PIPE, stdout=PIPE)
    for line in  proc_input:
        proc.stdin.write(line.encode())
        yield proc.stdout.readline()
        time.sleep(1)

Problem: The proc.stdout.readline() just blocks and doesn't show anything.

What I already learned:


    import tempfile
    from subprocess import Popen, PIPE

    tp = tempfile.TemporaryFile()
    tp.write("aa\nbb\ncc\ndd".encode())
    tp.seek(0)
    proc = Popen(["cat"], stdin=tp, stdout=PIPE)
    for line in proc.stdout:
        print(line)


    proc_input = StringIO("aa\nbb\ncc\ndd")
    proc = Popen(["cat"], stdin=PIPE, stdout=PIPE)
    for line in  proc_input:
        proc.stdin.write(line.encode())
    proc.stdin.close()

    for line in proc.stdout:
            print(line)

What I also tried:

Additional info: I'm using Linux.

Remarks to Comments

It was suggested to break the input generator into chunks. This can be achieved via

   def PopenStreaming(process, popen_kwargs, nlines, input):
        while input:
            proc = Popen(process, stdin=PIPE, stdout=PIPE, **popen_kwargs)
            for n, row in enumerate(input):
                proc.stdin.write(row)
                if n == nlines:
                    proc.stdin.close()
                    break
            for row in proc.stdout:
                yield row

Upvotes: 1

Views: 1838

Answers (1)

finefoot
finefoot

Reputation: 11224

I'm not sure if it's always possible to do what you're trying to do. The docs at https://docs.python.org/3/library/subprocess.html say

Warning: Use communicate() rather than .stdin.write, .stdout.read or .stderr.read to avoid deadlocks due to any of the other OS pipe buffers filling up and blocking the child process.

So you're supposed to use communicate, but that means waiting for the process to terminate:

Popen.communicate(input=None, timeout=None) Interact with process: Send data to stdin. Read data from stdout and stderr, until end-of-file is reached. Wait for process to terminate.

That means you would be able to use communicate only once, which is not what you want.

However, I think using a line-buffered text mode should be safe to avoid a dead-lock:

from subprocess import Popen, PIPE

kwargs = {
    "stdin": PIPE,
    "stdout": PIPE,
    "universal_newlines": True,  # text mode
    "bufsize": 1,  # line buffered
}

with Popen(["cat"], **kwargs) as process:
    for data in ["A\n", "B\n", "C\n"]:
        process.stdin.write(data)
        print("data sent:", data)
        output = process.stdout.readline()
        print("output received:", output)

If that isn't applicable in your case, maybe you can split your call into multiple smaller calls? Using check_output with its input keyword argument might also simplify your code:

from subprocess import check_output
output = check_output(["cat"], input=b"something\n")
print(output)

Upvotes: 1

Related Questions