kangshiyin
kangshiyin

Reputation: 9779

Python3 pipe I/O on np.ndarray with raw binary data failed

I have a binary raw data file in.dat storing 4 int32 values.

$ xxd in.dat 
00000000: 0100 0000 0200 0000 0300 0000 0400 0000  ................

I want to read them into np.ndarray, multiply by 2, then write them out to stdout with the same raw binary format as in.dat. The expected output is like,

$ xxd out.dat 
00000000: 0200 0000 0400 0000 0600 0000 0800 0000  ................

The code is like this,

#!/usr/bin/env python3

import sys
import numpy as np

if __name__ == '__main__':
    y = np.fromfile(sys.stdin, dtype='int32')
    y *= 2
    sys.stdout.buffer.write(y.astype('int32').tobytes())
    exit(0)

I find it works as expected with <,

$ python3 test.py <in.dat >out.dat

But it does not work with a pipe |. Here comes the error message.

$ cat in.dat | python3 test.py >out.dat
Traceback (most recent call last):
  File "test.py", line 7, in <module>
    y = np.fromfile(sys.stdin, dtype='int32')
OSError: obtaining file position failed

What do I miss here?

Upvotes: 2

Views: 601

Answers (2)

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 96236

If you use np.frombuffer, it should work both ways:

pipebytes.py

import numpy as np
import sys
print(np.frombuffer(sys.stdin.buffer.read(), dtype=np.int32))

Now,

Juans-MacBook-Pro:temp juan$ xxd testdata.dat
00000000: 0100 0000 0200 0000 0300 0000            ............
Juans-MacBook-Pro:temp juan$ python pipebytes.py < testdata.dat
[1 2 3]
Juans-MacBook-Pro:temp juan$ cat testdata.dat | python pipebytes.py
[1 2 3]
Juans-MacBook-Pro:temp juan$

Although, I suspect this will make a copy of the data.

Upvotes: 2

Bailey Parker
Bailey Parker

Reputation: 15905

This is because when redirecting a file in, stdin is seekable (because it isn't a TTY or pipe, for example, it's just a file that's been given FD 1). Try invoking the following script with cat foo.txt | python3 test.py vs python3 test.py <foo.txt (assuming foo.txt contains some text):

import sys

sys.stdin.seek(1)
print(sys.stdin.read())

The former will error with:

Traceback (most recent call last):
  File "test.py", line 3, in <module>
    sys.stdin.seek(1)
io.UnsupportedOperation: underlying stream is not seekable

That said, numpy is way overkill for what you're trying to do here. You can easily achieve this with a few lines and struct:

import struct
import sys

FORMAT = '@i'


def main():
    try:
        while True:
            num = struct.unpack(FORMAT, sys.stdin.buffer.read(struct.calcsize(FORMAT)))
            sys.stdout.buffer.write(struct.pack(FORMAT, num * 2))
    except EOFError:
        pass

if __name__ == '__main__':
    main()

Edit: there's also no need for sys.exit(0). This is the default.

Upvotes: 2

Related Questions