Reputation: 222621
I am reading a source-code which downloads the zip-file and reads the data into numpy array. The code suppose to work on macos and linux and here is the snippet that I see:
def _read32(bytestream):
dt = numpy.dtype(numpy.uint32).newbyteorder('>')
return numpy.frombuffer(bytestream.read(4), dtype=dt)
This function is used in the following context:
with gzip.open(filename) as bytestream:
magic = _read32(bytestream)
It is not hard to see what happens here, but I am puzzled with the purpose of newbyteorder('>')
. I read the documentation, and know what endianness mean, but can not understand why exactly developer added newbyteorder (in my opinion it is not really needed).
Upvotes: 11
Views: 3094
Reputation: 176880
It is just a way of ensuring that the bytes are interpreted from the resulting array in the correct order, regardless of a system's native byteorder.
By default, the built in NumPy integer dtypes will use the byteorder that is native to your system. For example, my system is little-endian, so simply using the dtype numpy.dtype(numpy.uint32)
will mean that values read into an array from a buffer with the bytes in big-endian order will not be interpreted correctly.
If np.frombuffer
is to meant to recieve bytes that are known to be in a particular byteorder, the best practice is to modify the dtype using newbyteorder
. This is mentioned in the documents for np.frombuffer
:
Notes
If the buffer has data that is not in machine byte-order, this should be specified as part of the data-type, e.g.:
>>> dt = np.dtype(int) >>> dt = dt.newbyteorder('>') >>> np.frombuffer(buf, dtype=dt)
The data of the resulting array will not be byteswapped, but will be interpreted correctly.
Upvotes: 4
Reputation: 2073
That's because data downloaded is in big endian format as described in source page: http://yann.lecun.com/exdb/mnist/
All the integers in the files are stored in the MSB first (high endian) format used by most non-Intel processors. Users of Intel processors and other low-endian machines must flip the bytes of the header.
Upvotes: 8