Sridhar Iyer
Sridhar Iyer

Reputation: 2840

Bash pipe to python

I need to absorb output of a bash command via pipe in real time. E.g

for i in $(seq 1 4); do echo $i; sleep 1; done | ./script.py

Where script.py has this

for line in sys.stdin.readlines():
        print line

I'm expecting the sequence to be printed as it becomes available, but the python script is waiting for bash script to end before proceeding.

I looked at this related answer, but that didn't solve my problem. How do I go about achieving this in python?

Upvotes: 6

Views: 5294

Answers (3)

dawg
dawg

Reputation: 103714

With Python 2.7.9 (and probably all Python's prior to 3.x), this does what you expect:

#!/usr/bin/python

import sys

while True:
   line=sys.stdin.readline()
   if not line:
      break
   print line   

You can also do:

#!/usr/bin/python

import sys

for line in iter(sys.stdin.readline, ''):
   print line 

On Python 3.4.3, you can do what abarnert suggests:

#!/usr/local/bin/python3

import sys

for line in sys.stdin:
    print(line)

You can also reopen sys.stdin with the io class as Python 3 uses:

#!/usr/bin/python

import sys, io

for line in io.open(sys.stdin.fileno()):
    print(line)

The 1st, 2nd, and last methods all work on Python 2.7.6 and 2.7.9 and Python 3.4.3 on OS X; the third method, only on Python 3.

Upvotes: 6

abarnert
abarnert

Reputation: 365607

The first problem is that readlines reads all the lines into a list. It can't do that until all of the lines are present, which won't be until stdin has reached EOF.

But you don't actually need a list of the lines, just some iterable of the lines. And a file, like sys.stdin, already is such an iterable. And it's a lazy one, that generates one line at a time as soon as they're available, instead of waiting to generate them all at once.

So:

for line in sys.stdin:
    print line

Whenever you find yourself reaching for readlines, ask yourself whether you really need it. The answer will always be no. (Well, except when you want to call it with an argument, or on some defective not-quite-file-like object.) See Readlines Considered Silly for more.


But meanwhile, there's a second problem. It's not that Python is buffering its stdin, or that the other process is buffering its stdout, but that the file-object iterator itself is doing internal buffering, which may (depending on your platform—but on most POSIX platforms, it usually will) prevent you from getting to the first line until EOF, or at least until a lot of lines have been read.

This is a known problem with Python 2.x, which has been fixed in 3.x,* but that doesn't help you unless you're willing to upgrade.

The solution is mentioned in the Command line and environment docs, and in the manpage on most systems, but buried in the middle of the -u flag documentation:

Note that there is internal buffering in xreadlines(), readlines() and file-object iterators ("for line in sys.stdin") which is not influenced by this option. To work around this, you will want to use "sys.stdin.readline()" inside a "while 1:" loop.

In other words:

while True:
    line = sys.stdin.readline()
    if not line:
        break
    print line

Or:

for line in iter(sys.stdin.readline, ''):
    print line

For a different problem, in this answer, Alex Martelli points out that you can always just ignore sys.stdin and re-fdopen the file descriptor. Which means that you get a wrapper around a POSIX fd instead of a C stdio handle. But that's neither necessary nor sufficient for this question, because the problem isn't with the C stdio buffering, but the way the file.__iter__ buffering interacts with it.


* Python 3.x doesn't use the C stdio library's buffering anymore; it does everything itself, in the types in the io module, which means the iterator can just share the same buffer the file object itself is using. While io is available on 2.x as well, it's not the default thing you get for open—or for the stdio file handles, which is why it doesn't help here.

Upvotes: 9

Alfred Rossi
Alfred Rossi

Reputation: 1986

The current most upvoted answer does not actually answer the question as it does not print the output as it streams. Something like the code below should do what you want:

import sys

def readline():
    while True:
        res = sys.stdin.readline()
        if not res:
            break
        yield res

for line in readline():
    print line

Here, rather than wait for readlines to construct a list, we read a single line and then yield the value. And we just continue consuming input and yielding until the end of the stream is signaled by an empty return from sys.stdin.readline().

Upvotes: 1

Related Questions