danielshiplett
danielshiplett

Reputation: 699

Reading binary data from stdin

Is it possible to read stdin as binary data in Python 2.6? If so, how?

I see in the Python 3.1 documentation that this is fairly simple, but the facilities for doing this in 2.6 don't seem to be there.

If the methods described in 3.1 aren't available, is there a way to close stdin and reopen in in binary mode?

Just to be clear, I am using 'type' in a MS-DOS shell to pipe the contents of a binary file to my python code. This should be the equivalent of a Unix 'cat' command, as far as I understand. But when I test this out, I always get one byte less than the expected file size.


The reason I'm going the Java/JAR/Jython route is because one of my main external libraries is only available as a Java JAR. But unfortunately, I had started my work as Python. It might have been easier to convert my code over to Java a while ago, but since this stuff was all supposed to be compatible, I figured I would try trucking through it and prove it could be done.

In case anyone was wondering, this is also related to this question I asked a few days ago.

Some of was answered in this question.

So I'll try to update my original question with some notes on what I have figured out so far.

Upvotes: 45

Views: 44028

Answers (7)

pts
pts

Reputation: 87451

To read binary data from stdin (in Python 2.4–2.7, 3.0–, both Unix and Windows), do this:

import os, sys
if sys.platform.startswith('win'):
    try:
        __import__('msvcrt').setmode(sys.stdout.fileno(), os.O_BINARY)
    except ImportError:
        pass
sys.stdin = os.fdopen(sys.stdin.fileno(), 'rb')
...
print(sys.stdin.read(4096))

To read binary data from stdin, without modifying sys.stdin (in Python 2.4–2.7, 3.0–, both Unix and Windows), do this:

import os, sys
f = os.fdopen(os.dup(sys.stdin.fileno()), 'rb')
if sys.platform.startswith('win'):
    try:
        __import__('msvcrt').setmode(f.fileno(), os.O_BINARY)
    except ImportError:
        pass
...
print(f.read(4096))

To read binary data unbuffered (i.e. as soon as it is available to the Python process) from a file object, while putting the underlying file descriptor to binary mode, do this (in Python 2.4–2.7, 3.0–, both Unix and Windows):

import os, sys
f = sys.stdin  # Or anything other file object.
if sys.platform.startswith('win'):
    try:
        __import__('msvcrt').setmode(f.fileno(), os.O_BINARY)
    except ImportError:
        pass
...
print(os.read(f.fileno(), 4096))

For unbuffered binary reads, it's tempting to do f = os.fdopen(f.fileno(), 'rb', 0), but in Python 2.x it doesn't make the read (i.e. f.read(4096)) unbuffered, it would still wait indefinitely for more input until the 4096 bytes are filled or EOF is reached.

Upvotes: 0

Jay
Jay

Reputation: 2888

You can perform an unbuffered read with:

os.read(0, bytes_to_read)

with 0 being the file descriptor for stdin

Upvotes: 4

anatoly techtonik
anatoly techtonik

Reputation: 20569

Here is the final cut for Linux/Windows Python 2/3 compatible code to read data from stdin without corruption:

import sys

PY3K = sys.version_info >= (3, 0)

if PY3K:
    source = sys.stdin.buffer
else:
    # Python 2 on Windows opens sys.stdin in text mode, and
    # binary data that read from it becomes corrupted on \r\n
    if sys.platform == "win32":
        # set sys.stdin to binary mode
        import os, msvcrt
        msvcrt.setmode(sys.stdin.fileno(), os.O_BINARY)
    source = sys.stdin

b = source.read()

Upvotes: 24

Dan Menes
Dan Menes

Reputation: 6797

Use the -u command line switch to force Python 2 to treat stdin, stdout and stderr as binary unbuffered streams.

C:> type mydoc.txt | python.exe -u myscript.py

Upvotes: 15

Prestel Nué
Prestel Nué

Reputation: 851

From the docs (see here):

The standard streams are in text mode by default. To write or read binary data to these, use the underlying binary buffer. For example, to write bytes to stdout, use sys.stdout.buffer.write(b'abc').

But, as in the accepted answer, invoking python with a -u is another option which forces stdin, stdout and stderr to be totally unbuffered. See the python(1) manpage for details.

See the documentation on io for more information on text buffering, and use sys.stdin.detach() to disable buffering from within Python.

Upvotes: 30

Frazil
Frazil

Reputation: 107

If you still need this... This simple test i've used to read binary file that contains 0x1A character in between

import os, sys, msvcrt

msvcrt.setmode (sys.stdin.fileno(), os.O_BINARY)
s = sys.stdin.read()
print len (s)

My test file data was:

0x23, 0x1A, 0x45

Without setting stdin to binary mode this test prints 1 as soon it treats 0x1A as EOF. Of course it works on windows only, because depends on msvcrt module.

Upvotes: 9

Yann Ramin
Yann Ramin

Reputation: 33197

import sys

data = sys.stdin.read(10) # Read 10 bytes from stdin

If you need to interpret binary data, use the struct module.

Upvotes: -3

Related Questions