Python3 pipe I/O on np.ndarray with raw binary data failed

Question

I have a binary raw data file in.dat storing 4 int32 values.

$ xxd in.dat 
00000000: 0100 0000 0200 0000 0300 0000 0400 0000  ................

I want to read them into np.ndarray, multiply by 2, then write them out to stdout with the same raw binary format as in.dat. The expected output is like,

$ xxd out.dat 
00000000: 0200 0000 0400 0000 0600 0000 0800 0000  ................

The code is like this,

#!/usr/bin/env python3

import sys
import numpy as np

if __name__ == '__main__':
    y = np.fromfile(sys.stdin, dtype='int32')
    y *= 2
    sys.stdout.buffer.write(y.astype('int32').tobytes())
    exit(0)

I find it works as expected with <,

$ python3 test.py out.dat

But it does not work with a pipe |. Here comes the error message.

$ cat in.dat | python3 test.py >out.dat
Traceback (most recent call last):
  File "test.py", line 7, in 
    y = np.fromfile(sys.stdin, dtype='int32')
OSError: obtaining file position failed

What do I miss here?

Bailey Parker · Accepted Answer

This is because when redirecting a file in, stdin is seekable (because it isn't a TTY or pipe, for example, it's just a file that's been given FD 1). Try invoking the following script with cat foo.txt | python3 test.py vs python3 test.py (assuming foo.txt contains some text):



import sys

sys.stdin.seek(1)
print(sys.stdin.read())


The former will error with:

Traceback (most recent call last):
  File "test.py", line 3, in 
    sys.stdin.seek(1)
io.UnsupportedOperation: underlying stream is not seekable


That said, numpy is way overkill for what you're trying to do here. You can easily achieve this with a few lines and struct:

import struct
import sys

FORMAT = '@i'


def main():
    try:
        while True:
            num = struct.unpack(FORMAT, sys.stdin.buffer.read(struct.calcsize(FORMAT)))
            sys.stdout.buffer.write(struct.pack(FORMAT, num * 2))
    except EOFError:
        pass

if __name__ == '__main__':
    main()


Edit: there's also no need for sys.exit(0). This is the default.

Python3 pipe I/O on np.ndarray with raw binary data failed

Answers (2)

Related Questions