Reputation: 9779
I have a binary raw data file in.dat
storing 4 int32 values.
$ xxd in.dat
00000000: 0100 0000 0200 0000 0300 0000 0400 0000 ................
I want to read them into np.ndarray
, multiply by 2, then write them out to stdout with the same raw binary format as in.dat
. The expected output is like,
$ xxd out.dat
00000000: 0200 0000 0400 0000 0600 0000 0800 0000 ................
The code is like this,
#!/usr/bin/env python3
import sys
import numpy as np
if __name__ == '__main__':
y = np.fromfile(sys.stdin, dtype='int32')
y *= 2
sys.stdout.buffer.write(y.astype('int32').tobytes())
exit(0)
I find it works as expected with <
,
$ python3 test.py <in.dat >out.dat
But it does not work with a pipe |
. Here comes the error message.
$ cat in.dat | python3 test.py >out.dat
Traceback (most recent call last):
File "test.py", line 7, in <module>
y = np.fromfile(sys.stdin, dtype='int32')
OSError: obtaining file position failed
What do I miss here?
Upvotes: 2
Views: 601
Reputation: 96236
If you use np.frombuffer
, it should work both ways:
pipebytes.py
import numpy as np
import sys
print(np.frombuffer(sys.stdin.buffer.read(), dtype=np.int32))
Now,
Juans-MacBook-Pro:temp juan$ xxd testdata.dat
00000000: 0100 0000 0200 0000 0300 0000 ............
Juans-MacBook-Pro:temp juan$ python pipebytes.py < testdata.dat
[1 2 3]
Juans-MacBook-Pro:temp juan$ cat testdata.dat | python pipebytes.py
[1 2 3]
Juans-MacBook-Pro:temp juan$
Although, I suspect this will make a copy of the data.
Upvotes: 2
Reputation: 15905
This is because when redirecting a file in, stdin is seekable (because it isn't a TTY or pipe, for example, it's just a file that's been given FD 1). Try invoking the following script with cat foo.txt | python3 test.py
vs python3 test.py <foo.txt
(assuming foo.txt contains some text):
import sys
sys.stdin.seek(1)
print(sys.stdin.read())
The former will error with:
Traceback (most recent call last):
File "test.py", line 3, in <module>
sys.stdin.seek(1)
io.UnsupportedOperation: underlying stream is not seekable
That said, numpy is way overkill for what you're trying to do here. You can easily achieve this with a few lines and struct
:
import struct
import sys
FORMAT = '@i'
def main():
try:
while True:
num = struct.unpack(FORMAT, sys.stdin.buffer.read(struct.calcsize(FORMAT)))
sys.stdout.buffer.write(struct.pack(FORMAT, num * 2))
except EOFError:
pass
if __name__ == '__main__':
main()
Edit: there's also no need for sys.exit(0)
. This is the default.
Upvotes: 2