Reputation: 1246
Basically I wanted to write a function that will merge stdout and stderr of subprocess.Popen
(or in general case of any input file descriptors) into one generator that gives me (file_descriptor, line)
tuples.
My first attempt looked like this: from select import select import subprocess
def _merge_proc_output( process ):
inputs = (process.stdout, process.stderr)
while process.poll() is None:
for f in select(inputs , (), () )[0]:
line = f.readline()
if len(line): yield f, line
It seemed to work most of the time. For i.e. 10 runs only one gets corrupted (point is that it is random). Occasionally it misses some lines and I think it is always lines at the end. Unfortunately it is not something that I was able to reproduce consistently, so it is very hard to debug.
Can anyone see what is the issue with above code that would cause it to drop lines from the end of one of streams?
Currently I use more resource hungry and verbose code, but more portable:
import subprocess
import threading
from Queue import Queue
def _merge_proc_output( process ):
q = Queue()
def push(fd):
for l in fd:
q.put((fd, l))
q.put(None)
pipes = (process.stdout, process.stderr)
threads = [ threading.Thread( target = push, args = (fd,) ) for fd in pipes ]
[ t.start() for t in threads ]
for t in threads:
while True:
w = q.get()
if w is None:
break
yield w
[ t.join() for t in threads ]
And this seems to work fine (or at least I haven't noticed issues yet). Still I would like to know what is wrong with my original code.
P.S. if you see issues with second solution please do comment on that also.
edit:
Hm, possibly I have an idea why it happens. Assuming my observation about only last lines missing it might be that process.poll()
returns something, but there is still stuff in output buffers for those streams.
I have modified my original function by adding loop that tries to read everything from output streams:
def _merge_proc_output( process ):
inputs = (process.stdout, process.stderr)
while process.poll() is None:
for f in select(inputs , (), () )[0]:
line = f.readline()
if len(line): yield f, line
for i in inputs:
for l in i:
yield i,l
I have to have a bit of play to check if this fixes my issues.
Upvotes: 1
Views: 96
Reputation: 280638
When the subprocess terminates, you immediately stop reading its output. That means that if you weren't done reading what was already produced, you lose the lines at the end.
Upvotes: 1