deft_code
deft_code

Reputation: 59269

read subprocess stdout line by line

My python script uses subprocess to call a linux utility that is very noisy. I want to store all of the output to a log file and show some of it to the user. I thought the following would work, but the output doesn't show up in my application until the utility has produced a significant amount of output.

# fake_utility.py, just generates lots of output over time
import time
i = 0
    while True:
        print(hex(i)*512)
        i += 1
        time.sleep(0.5)

In the parent process:

import subprocess

proc = subprocess.Popen(['python', 'fake_utility.py'], stdout=subprocess.PIPE)
for line in proc.stdout:
    # the real code does filtering here
    print("test:", line.rstrip())

The behavior I really want is for the filter script to print each line as it is received from the subprocess, like tee does but within Python code.

What am I missing? Is this even possible?


Upvotes: 322

Views: 581673

Answers (14)

talljosh
talljosh

Reputation: 742

On Linux (and presumably OSX), sometimes the parent process doesn't see the output immediately because the child process is buffering its output (see this article for a more detailed explanation).

If the child process is a Python program, you can disable this by setting the environment variable PYTHONUNBUFFERED to 1 as described in this answer.

If the child process is not a Python program, you can sometimes trick it into running in line-buffered mode by creating a pseudo-terminal like so:

import os
import pty
import subprocess

# Open a pseudo-terminal
master_fd, slave_fd = pty.openpty()

# Open the child process on the slave end of the PTY
with subprocess.Popen(
        ['python', 'fake_utility.py'],
        stdout=slave_fd,
        stdin=slave_fd,
        stderr=slave_fd) as proc:

    # Close our copy of the slave FD (without this we won't notice
    # when the child process closes theirs)
    os.close(slave_fd)

    # Convert the master FD into a file-like object
    with open(master_fd, 'r') as stdout:
        try:
            for line in stdout:
                # Do the actual filtering here
                print("test:", line.rstrip())
        except OSError:
            # This happens when the child process closes its STDOUT,
            # usually when it exits
            pass

If the child process needs to read from STDIN, you can get away without the stdin=slave_fd argument to subprocess.Popen(), as the child process should be checking the status of STDOUT (not STDIN) when it decides whether or not to use line-buffering.

Finally, some programs may actually directly open and write to their controlling terminal instead of writing to STDOUT. If you need to catch this case, you can use the setsid utility by replacing ['python', 'fake_utility.py'] with ['setsid', 'python', 'fake_utility.py'] in the call to subprocess.Popen().

Upvotes: 0

wim
wim

Reputation: 362717

The subprocess module has come a long way since 2010, and most of the answers here are quite outdated.

Here is a simple way working for modern Python versions:

from subprocess import Popen, PIPE, STDOUT

with Popen(args, stdout=PIPE, stderr=STDOUT, text=True) as proc:
    for line in proc.stdout:
        print(line)
rc = proc.returncode

About using Popen as a context-manager (supported since Python 3.2): on exit of the with block, standard file descriptors are closed, and the process is waited / returncode attribute set. See subprocess.py:Popen.__exit__ in CPython sources.

Upvotes: 17

Steven Dickinson
Steven Dickinson

Reputation: 345

I came here with the same problem, and found that none of the provided answers really worked for me. The closest was adding the sys.std.flush() to the child process, which works but means modifying that process, which I didn't want to do.

Setting the bufsize=1 in the Popen() didn't seem to have any effect for my use case. I guess the problem is that the child process is buffering, regardless of how I call the Popen().

However, I found this question with similar problem (How can I flush the output of the print function?) and one of the answers is to set the environment variable PYTHONUNBUFFERED=1 when calling Popen. This works how I want it to, i.e. real-time line-by-line reading of the output of the child process.

Upvotes: 0

duggi
duggi

Reputation: 566

An improved version of https://stackoverflow.com/a/57093927/2580077 and suitable to python 3.10

A function to iterate over both stdout and stderr of the process in parallel.

Improvements:

  • Unified queue to maintain the order of entries in stdout and stderr.
  • Yield all available lines in stdout and stderr - this is useful when the calling process is slower.
  • Use blocking in the loop to prevent the process from utilizing 100% of the CPU.
import time
from queue import Queue, Empty
from concurrent.futures import ThreadPoolExecutor

def enqueue_output(file, queue, level):
    for line in file:
        queue.put((level, line))
    file.close()


def read_popen_pipes(p, blocking_delay=0.5):

    with ThreadPoolExecutor(2) as pool:
        q = Queue()

        pool.submit(enqueue_output, p.stdout, q, 'stdout')
        pool.submit(enqueue_output, p.stderr, q, 'stderr')

        while True:
            if p.poll() is not None and q.empty():
                break

            lines = []
            while not q.empty():
                lines.append(q.get_nowait())

            if lines:
                yield lines

            # otherwise, loop will run as fast as possible and utilizes 100% of the CPU
            time.sleep(blocking_delay)

Usage:

with subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, bufsize=1, universal_newlines=True) as p:
    for lines in read_popen_pipes(p):
        # lines - all the log entries since the last loop run.
        print('ext cmd', lines)
        # process lines

Upvotes: 0

shakram02
shakram02

Reputation: 11826

I tried this with python3 and it worked, source

When you use popen to spawn the new thread, you tell the operating system to PIPE the stdout of the child processes so the parent process can read it and here, stderr is copied to the stderr of the parent process.

in output_reader we read each line of stdout of the child process by wrapping it in an iterator that populates line by line output from the child process whenever a new line is ready.

def output_reader(proc):
    for line in iter(proc.stdout.readline, b''):
        print('got line: {0}'.format(line.decode('utf-8')), end='')


def main():
    proc = subprocess.Popen(['python', 'fake_utility.py'],
                            stdout=subprocess.PIPE,
                            stderr=subprocess.STDOUT)

    t = threading.Thread(target=output_reader, args=(proc,))
    t.start()

    try:
        time.sleep(0.2)
        import time
        i = 0
    
        while True:
        print (hex(i)*512)
        i += 1
        time.sleep(0.5)
    finally:
        proc.terminate()
        try:
            proc.wait(timeout=0.2)
            print('== subprocess exited with rc =', proc.returncode)
        except subprocess.TimeoutExpired:
            print('subprocess did not terminate in time')
    t.join()

Upvotes: 1

Stan S.
Stan S.

Reputation: 277

I was having a problem with the arg list of Popen to update servers, the following code resolves this a bit.

import getpass
from subprocess import Popen, PIPE

username = 'user1'
ip = '127.0.0.1'

print ('What is the password?')
password = getpass.getpass()
cmd1 = f"""sshpass -p {password} ssh {username}@{ip}"""
cmd2 = f"""echo {password} | sudo -S apt update"""
cmd3 = " && "
cmd4 = f"""echo {password} | sudo -S apt upgrade -y"""
cmd5 = " && "
cmd6 = "exit"
commands = [cmd1, cmd2, cmd3, cmd4, cmd5, cmd6]

command = " ".join(commands)

cmd = command.split()

with Popen(cmd, stdout=PIPE, bufsize=1, universal_newlines=True) as p:
    for line in p.stdout:
        print(line, end='')

And to run the update on a local computer, the following code example does this.

import getpass
from subprocess import Popen, PIPE

print ('What is the password?')
password = getpass.getpass()

cmd1_local = f"""apt update"""
cmd2_local = f"""apt upgrade -y"""
commands = [cmd1_local, cmd2_local]

with Popen(['echo', password], stdout=PIPE) as auth:
    for cmd in commands:
        cmd = cmd.split()
        with Popen(['sudo','-S'] + cmd, stdin=auth.stdout, stdout=PIPE, bufsize=1, universal_newlines=True) as p:
            for line in p.stdout:
                print(line, end='')

Upvotes: 0

Rômulo Ceccon
Rômulo Ceccon

Reputation: 10347

I think the problem is with the statement for line in proc.stdout, which reads the entire input before iterating over it. The solution is to use readline() instead:

#filters output
import subprocess
proc = subprocess.Popen(['python','fake_utility.py'],stdout=subprocess.PIPE)
while True:
  line = proc.stdout.readline()
  if not line:
    break
  #the real code does filtering here
  print "test:", line.rstrip()

Of course you still have to deal with the subprocess' buffering.

Note: according to the documentation the solution with an iterator should be equivalent to using readline(), except for the read-ahead buffer, but (or exactly because of this) the proposed change did produce different results for me (Python 2.5 on Windows XP).

Upvotes: 249

StefanQ
StefanQ

Reputation: 784

Pythont 3.5 added the methods run() and call() to the subprocess module, both returning a CompletedProcess object. With this you are fine using proc.stdout.splitlines():

proc = subprocess.run( comman, shell=True, capture_output=True, text=True, check=True )
for line in proc.stdout.splitlines():
   print "stdout:", line

See also How to Execute Shell Commands in Python Using the Subprocess Run Method

Upvotes: 6

jbg
jbg

Reputation: 5266

Bit late to the party, but was surprised not to see what I think is the simplest solution here:

import io
import subprocess

proc = subprocess.Popen(["prog", "arg"], stdout=subprocess.PIPE)
for line in io.TextIOWrapper(proc.stdout, encoding="utf-8"):  # or another encoding
    # do something with line

(This requires Python 3.)

Upvotes: 110

Rotareti
Rotareti

Reputation: 53803

A function that allows iterating over both stdout and stderr concurrently, in realtime, line by line

In case you need to get the output stream for both stdout and stderr at the same time, you can use the following function.

The function uses Queues to merge both Popen pipes into a single iterator.

Here we create the function read_popen_pipes():

from queue import Queue, Empty
from concurrent.futures import ThreadPoolExecutor


def enqueue_output(file, queue):
    for line in iter(file.readline, ''):
        queue.put(line)
    file.close()


def read_popen_pipes(p):

    with ThreadPoolExecutor(2) as pool:
        q_stdout, q_stderr = Queue(), Queue()

        pool.submit(enqueue_output, p.stdout, q_stdout)
        pool.submit(enqueue_output, p.stderr, q_stderr)

        while True:

            if p.poll() is not None and q_stdout.empty() and q_stderr.empty():
                break

            out_line = err_line = ''

            try:
                out_line = q_stdout.get_nowait()
            except Empty:
                pass
            try:
                err_line = q_stderr.get_nowait()
            except Empty:
                pass

            yield (out_line, err_line)

read_popen_pipes() in use:

import subprocess as sp


with sp.Popen(my_cmd, stdout=sp.PIPE, stderr=sp.PIPE, text=True) as p:

    for out_line, err_line in read_popen_pipes(p):

        # Do stuff with each line, e.g.:
        print(out_line, end='')
        print(err_line, end='')

    return p.poll() # return status-code

Upvotes: 26

mdh
mdh

Reputation: 5563

The following modification of Rômulo's answer works for me on Python 2 and 3 (2.7.12 and 3.6.1):

import os
import subprocess

process = subprocess.Popen(command, stdout=subprocess.PIPE)
while True:
  line = process.stdout.readline()
  if line != '':
    os.write(1, line)
  else:
    break

Upvotes: 0

aiven
aiven

Reputation: 4313

You can also read lines w/o loop. Works in python3.6.

import os
import subprocess

process = subprocess.Popen(command, stdout=subprocess.PIPE)
list_of_byte_strings = process.stdout.readlines()

Upvotes: 7

user1747134
user1747134

Reputation: 2472

You want to pass these extra parameters to subprocess.Popen:

bufsize=1, universal_newlines=True

Then you can iterate as in your example. (Tested with Python 3.5)

Upvotes: 18

Steve Carter
Steve Carter

Reputation: 505

Indeed, if you sorted out the iterator then buffering could now be your problem. You could tell the python in the sub-process not to buffer its output.

proc = subprocess.Popen(['python','fake_utility.py'],stdout=subprocess.PIPE)

becomes

proc = subprocess.Popen(['python','-u', 'fake_utility.py'],stdout=subprocess.PIPE)

I have needed this when calling python from within python.

Upvotes: 30

Related Questions