Reputation: 35331
The code below is a toy example of the actual situation I am dealing with1. (Warning: this code will loop forever.)
import subprocess
import uuid
class CountingWriter:
def __init__(self, filepath):
self.file = open(filepath, mode='wb')
self.counter = 0
def __enter__(self):
return self
def __exit__(self, exc_type, exc_value, traceback):
self.file.close()
def __getattr__(self, attr):
return getattr(self.file, attr)
def write(self, data):
written = self.file.write(data)
self.counter += written
return written
with CountingWriter('myoutput') as writer:
with subprocess.Popen(['/bin/gzip', '--stdout'],
stdin=subprocess.PIPE,
stdout=writer) as gzipper:
while writer.counter < 10000:
gzipper.stdin.write(str(uuid.uuid4()).encode())
gzipper.stdin.flush()
writer.flush()
# writer.counter remains unchanged
gzipper.stdin.close()
In English, I start a subprocess, called gzipper
, which receives input through its stdin
, and writes compressed output to a CountingWriter
object. The code features a while
-loop, depending on the value of writer.counter
, that at each iteration, feeds some random content to gzipper
.
This code does not work!
More specifically, writer.counter
never gets updated, so execution never leaves the while
-loop.
This example is certainly artificial, but it captures the problem I would like to solve: how to terminate the feeding of data into gzipper
once it has written a certain number of bytes.
Q: How must I change the code above to get this to work?
FWIW, I thought that the problem had to do with buffering, hence all the calls to *.flush()
in the code. They have no noticeable effect, though. Incidentally, I cannot call gzipper.stdout.flush()
because gzipper.stdout
is not a CountingWriter
object (as I had expected), but rather it is None
, surprisingly enough.
1 In particular, I am using a /bin/gzip --stdout
subprocess only for the sake of this example, because it is a more readily available alternative to the compression program that I am actually working with. If I really wanted to gzip
-compress my output, I would use Python's standard gzip
module.
Upvotes: 0
Views: 52
Reputation: 110696
Your "writer" is an arbitrary Python object - subprocess piping needs real files - as those will be used by their O.S. handlers in the subprocess. The only reason you get any data written to the output file at all is because you proxied getattr
- so the code in subprocess have retrieved the fileno()
for your proxied file - the real, operating system level, file is the only thing seen in the actual subprocess (gzip) - not your writer
object.
What can be done, instead, is promote counter
to a property which will call stat
on your output file:
import subprocess
import uuid
import os
class CountingWriter:
def __init__(self, filepath):
self.filepath = filepath
@property
def counter(self):
if not hasattr(self, "file"):
return 0
return os.stat(self.filepath).st_size
def __enter__(self):
# by bringing the actual file openning into the `__enter__`,
# we avoid side effects just by instantiating the object.
self.file = open(self.filepath, mode='wb')
return self
def __exit__(self, exc_type, exc_value, traceback):
self.file.close()
self.file = self.filepath
def __getattr__(self, attr):
return getattr(self.file, attr)
def write(self, data):
return self.file.write(data)
Upvotes: 2