Vectorized byte-position conversion with numpy?

Question

I have a list of bytes, like so

b1 = int(123).to_bytes(length=2, byteorder="little", signed=False)
b2 = int(-987).to_bytes(length=2, byteorder="little", signed=True)

b = b1 + b2

stream = [b] * 10

Which you could imagine as an array like

b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'

To convert each position correctly, I would do (knowing which positions are signed, and which are not)

for line in stream:
    c1 = int.from_bytes(line[0:2], byteorder="little", signed=False)
    c2 = int.from_bytes(line[2:4], byteorder="little", signed=True)

But this is extremely inefficient looping. Given that I know the positions of the "columns", how would I do this with numpy in a column-wise vectorized fashion?

juanpa.arrivillaga · Accepted Answer

You can do this using a structured array. So given:

In [1]: b1 = int(123).to_bytes(length=2, byteorder="little", signed=False)
   ...: b2 = int(-987).to_bytes(length=2, byteorder="little", signed=True)
   ...:
   ...: b = b1 + b2
   ...:
   ...: stream = [b] * 10

In [2]: for line in stream:
   ...:     c1 = int.from_bytes(line[0:2], byteorder="little", signed=False)
   ...:     c2 = int.from_bytes(line[2:4], byteorder="little", signed=True)
   ...:     print(c1, c2)
   ...:
123 -987
123 -987
123 -987
123 -987
123 -987
123 -987
123 -987
123 -987
123 -987
123 -987

Then, create a buffer by joining the bytes, then use the structured dtype with the numpy.frombuffer helper:

In [3]: import numpy as np

In [4]: buffer = b''.join(stream)

In [5]: arr = np.frombuffer(buffer, dtype=np.dtype([('x','


Note, the names I gave, 'x', and 'y' were just placeholders. Use whatever you want. But with whatever name you choose, you can index into the structured array:
In [8]: arr['x']
Out[8]: array([123, 123, 123, 123, 123, 123, 123, 123, 123, 123], dtype=uint16)

In [9]: arr['y']
Out[9]:
array([-987, -987, -987, -987, -987, -987, -987, -987, -987, -987],
      dtype=int16)

Note, if you don't care about the names, you can use the shorthand dtype spec:
In [10]: np.frombuffer(buffer, dtype=np.dtype('

You can read more about specifying dtypes in the official docs

Vectorized byte-position conversion with numpy?

Answers (2)

Related Questions