komodovaran_
komodovaran_

Reputation: 2012

Vectorized byte-position conversion with numpy?

I have a list of bytes, like so

b1 = int(123).to_bytes(length=2, byteorder="little", signed=False)
b2 = int(-987).to_bytes(length=2, byteorder="little", signed=True)

b = b1 + b2

stream = [b] * 10

Which you could imagine as an array like

b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'

To convert each position correctly, I would do (knowing which positions are signed, and which are not)

for line in stream:
    c1 = int.from_bytes(line[0:2], byteorder="little", signed=False)
    c2 = int.from_bytes(line[2:4], byteorder="little", signed=True)

But this is extremely inefficient looping. Given that I know the positions of the "columns", how would I do this with numpy in a column-wise vectorized fashion?

Upvotes: 1

Views: 99

Answers (2)

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 96171

You can do this using a structured array. So given:

In [1]: b1 = int(123).to_bytes(length=2, byteorder="little", signed=False)
   ...: b2 = int(-987).to_bytes(length=2, byteorder="little", signed=True)
   ...:
   ...: b = b1 + b2
   ...:
   ...: stream = [b] * 10

In [2]: for line in stream:
   ...:     c1 = int.from_bytes(line[0:2], byteorder="little", signed=False)
   ...:     c2 = int.from_bytes(line[2:4], byteorder="little", signed=True)
   ...:     print(c1, c2)
   ...:
123 -987
123 -987
123 -987
123 -987
123 -987
123 -987
123 -987
123 -987
123 -987
123 -987

Then, create a buffer by joining the bytes, then use the structured dtype with the numpy.frombuffer helper:

In [3]: import numpy as np

In [4]: buffer = b''.join(stream)

In [5]: arr = np.frombuffer(buffer, dtype=np.dtype([('x','<u2'), ('y','<i2')]))

In [6]: arr
Out[6]:
array([(123, -987), (123, -987), (123, -987), (123, -987), (123, -987),
       (123, -987), (123, -987), (123, -987), (123, -987), (123, -987)],
      dtype=[('x', '<u2'), ('y', '<i2')])

Note, the names I gave, 'x', and 'y' were just placeholders. Use whatever you want. But with whatever name you choose, you can index into the structured array:

In [8]: arr['x']
Out[8]: array([123, 123, 123, 123, 123, 123, 123, 123, 123, 123], dtype=uint16)

In [9]: arr['y']
Out[9]:
array([-987, -987, -987, -987, -987, -987, -987, -987, -987, -987],
      dtype=int16)

Note, if you don't care about the names, you can use the shorthand dtype spec:

In [10]: np.frombuffer(buffer, dtype=np.dtype('<u2,<i2'))
Out[10]:
array([(123, -987), (123, -987), (123, -987), (123, -987), (123, -987),
       (123, -987), (123, -987), (123, -987), (123, -987), (123, -987)],
      dtype=[('f0', '<u2'), ('f1', '<i2')])

You can read more about specifying dtypes in the official docs

Upvotes: 1

Daniel
Daniel

Reputation: 42778

Use structured numpy arrays:

data = np.zeros(100, dtype=[('a', '<u2'),('b','<i2')])
data['a'] = 123
data['b'] = -987
stream = data.tobytes()

data = np.frombuffer(stream, dtype=[('a', '<u2'),('b','<i2')])

Upvotes: 0

Related Questions