Reputation: 2012
I have a list of bytes, like so
b1 = int(123).to_bytes(length=2, byteorder="little", signed=False)
b2 = int(-987).to_bytes(length=2, byteorder="little", signed=True)
b = b1 + b2
stream = [b] * 10
Which you could imagine as an array like
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
To convert each position correctly, I would do (knowing which positions are signed, and which are not)
for line in stream:
c1 = int.from_bytes(line[0:2], byteorder="little", signed=False)
c2 = int.from_bytes(line[2:4], byteorder="little", signed=True)
But this is extremely inefficient looping. Given that I know the positions of the "columns", how would I do this with numpy in a column-wise vectorized fashion?
Upvotes: 1
Views: 99
Reputation: 96171
You can do this using a structured array. So given:
In [1]: b1 = int(123).to_bytes(length=2, byteorder="little", signed=False)
...: b2 = int(-987).to_bytes(length=2, byteorder="little", signed=True)
...:
...: b = b1 + b2
...:
...: stream = [b] * 10
In [2]: for line in stream:
...: c1 = int.from_bytes(line[0:2], byteorder="little", signed=False)
...: c2 = int.from_bytes(line[2:4], byteorder="little", signed=True)
...: print(c1, c2)
...:
123 -987
123 -987
123 -987
123 -987
123 -987
123 -987
123 -987
123 -987
123 -987
123 -987
Then, create a buffer by joining the bytes, then use the structured dtype with the numpy.frombuffer
helper:
In [3]: import numpy as np
In [4]: buffer = b''.join(stream)
In [5]: arr = np.frombuffer(buffer, dtype=np.dtype([('x','<u2'), ('y','<i2')]))
In [6]: arr
Out[6]:
array([(123, -987), (123, -987), (123, -987), (123, -987), (123, -987),
(123, -987), (123, -987), (123, -987), (123, -987), (123, -987)],
dtype=[('x', '<u2'), ('y', '<i2')])
Note, the names I gave, 'x'
, and 'y'
were just placeholders. Use whatever you want. But with whatever name you choose, you can index into the structured array:
In [8]: arr['x']
Out[8]: array([123, 123, 123, 123, 123, 123, 123, 123, 123, 123], dtype=uint16)
In [9]: arr['y']
Out[9]:
array([-987, -987, -987, -987, -987, -987, -987, -987, -987, -987],
dtype=int16)
Note, if you don't care about the names, you can use the shorthand dtype spec:
In [10]: np.frombuffer(buffer, dtype=np.dtype('<u2,<i2'))
Out[10]:
array([(123, -987), (123, -987), (123, -987), (123, -987), (123, -987),
(123, -987), (123, -987), (123, -987), (123, -987), (123, -987)],
dtype=[('f0', '<u2'), ('f1', '<i2')])
You can read more about specifying dtypes in the official docs
Upvotes: 1
Reputation: 42778
Use structured numpy arrays:
data = np.zeros(100, dtype=[('a', '<u2'),('b','<i2')])
data['a'] = 123
data['b'] = -987
stream = data.tobytes()
data = np.frombuffer(stream, dtype=[('a', '<u2'),('b','<i2')])
Upvotes: 0