How to monitor the total size of output produced by a subprocess in real time?

Question

The code below is a toy example of the actual situation I am dealing with¹. (Warning: this code will loop forever.)

import subprocess
import uuid
class CountingWriter:
    def __init__(self, filepath):
        self.file = open(filepath, mode='wb')
        self.counter = 0

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_value, traceback):
        self.file.close()

    def __getattr__(self, attr):
        return getattr(self.file, attr)

    def write(self, data):
        written = self.file.write(data)
        self.counter += written
        return written

with CountingWriter('myoutput') as writer:
    with subprocess.Popen(['/bin/gzip', '--stdout'],
                          stdin=subprocess.PIPE,
                          stdout=writer) as gzipper:
        while writer.counter < 10000:
            gzipper.stdin.write(str(uuid.uuid4()).encode())
            gzipper.stdin.flush()
            writer.flush()
            # writer.counter remains unchanged

        gzipper.stdin.close()

In English, I start a subprocess, called gzipper, which receives input through its stdin, and writes compressed output to a CountingWriter object. The code features a while-loop, depending on the value of writer.counter, that at each iteration, feeds some random content to gzipper.

This code does not work!

More specifically, writer.counter never gets updated, so execution never leaves the while-loop.

This example is certainly artificial, but it captures the problem I would like to solve: how to terminate the feeding of data into gzipper once it has written a certain number of bytes.

Q: How must I change the code above to get this to work?

FWIW, I thought that the problem had to do with buffering, hence all the calls to *.flush() in the code. They have no noticeable effect, though. Incidentally, I cannot call gzipper.stdout.flush() because gzipper.stdout is not a CountingWriter object (as I had expected), but rather it is None, surprisingly enough.

^{¹ In particular, I am using a /bin/gzip --stdout subprocess only for the sake of this example, because it is a more readily available alternative to the compression program that I am actually working with. If I really wanted to gzip-compress my output, I would use Python's standard gzip module.}

How to monitor the total size of output produced by a subprocess in real time?

Answers (1)

Related Questions