Reputation: 63
When I try to save numpy array to bytes and then to a string, I have a problem to convert it back to numpy.ndarray
object.
The workflow is as follows:
numpy.ndarray.tobytes()
method.str()
function.numpy.ndarray
object.The reason why I need to convert to a numpy.ndarray
from str
object in the first place, is that when I store numpy
vectors in pandas.DataFrame
object and save it to a csv
file, all its' values are automatically converted to strings.
vector_np
array([ 1.06229002e-09, 1.91655440e-10, -1.64956463e-16, 1.96307718e-15,
1.70059011e-09, -7.69618695e-10, 1.23360626e-10, 3.63022924e-13,
8.98514856e-09, -1.36133589e-13, -7.49299599e-13, 1.66008671e-13,
-4.21360477e-19, 7.89110884e-10, -2.16149680e-10, -1.26254478e-10,
2.02095242e-25, -1.26993445e-12, -8.12166451e-18, 2.23239724e-11,
-5.50037583e-11, -1.53251136e-13, -3.10830309e-12, 2.30680945e-10,
-8.10731206e-26, 2.60155773e-13, -1.06329112e-14, 4.78776823e-12,
-4.07784303e-10, -8.77197289e-13, 1.77004211e-09, -9.20980905e-17,
1.43903266e-18, 5.07994419e-10, 4.98258585e-11, 8.73321720e-12,
6.29363312e-12, -1.58257277e-13, 8.08954343e-10, 8.14411205e-12,
-1.68514957e-11, -3.08011938e-22, -7.01468987e-10, 5.53965202e-10,
1.04966575e-14, 7.65319571e-12, -8.68981408e-11, -5.46472476e-13,
1.45874458e-17, 2.25920328e-13, -3.61730974e-14, 8.72030069e-15,
-1.79377261e-10, 4.44089262e-13, -5.83730415e-11, 8.98902950e-18,
-4.84719291e-12, 3.52673686e-20, 2.60145543e-15, 7.83406491e-12,
-3.19562609e-21, -2.28668156e-17, 4.01647830e-19, 1.58392215e-17,
1.63694860e-16, -4.43999002e-10, -7.19122365e-17, 3.52041690e-13,
5.89879618e-12, 1.06646093e-12, -7.04403754e-10, 4.81269166e-22,
-2.06538261e-29, -6.74965479e-17, 5.97543508e-20, -9.45708383e-15,
7.26174934e-12, 2.95691722e-17, -3.74215822e-10, 5.08219844e-21,
2.71608255e-11, 6.14458158e-14, -7.87953445e-11, 8.16108793e-18,
-2.53211721e-11, 2.98386775e-22, 3.62309568e-10, -6.23743793e-15,
-1.06038806e-11, -2.94587732e-16, -1.90497921e-10, 1.59673419e-17,
-1.62748671e-16, -2.75335439e-12, 9.76176482e-17, -1.22376910e-16,
2.59891188e-36, 2.97136378e-14, 4.42272559e-14, -1.15898610e-10,
-2.18070537e-15, -8.88566818e-12, -1.18584628e-20, 5.86762942e-19,
-1.11779358e-15, 6.57833768e-12, 1.47069543e-11, 2.32702798e-13,
5.17767605e-15, -4.11504103e-19, 1.09842176e-10, -1.09552797e-10,
-3.84399099e-17, -3.98524155e-10, 1.53404446e-23, -1.23608640e-11,
1.37235730e-12, 1.71359190e-15, -5.60941360e-11, 1.57248040e-14,
3.53669384e-11, -3.31775450e-09, 7.94055023e-19, 1.09552752e-12,
2.58000780e-16, -7.00311049e-11, 1.36630932e-11, -1.52650425e-11,
6.35766348e-12, 3.91606283e-15, 1.89650420e-12, 3.79430078e-12,
5.19628571e-17, 2.16840557e-18, -1.44380654e-22, 7.19658659e-10,
-8.76835961e-11, -6.63517982e-11, 1.02506433e-18, 2.09933218e-14,
2.03287939e-20, 5.20417107e-17, -2.82602334e-15, 8.01915600e-17,
-2.42230583e-12, 4.02554982e-12, 1.55936019e-14, -5.02367771e-19,
1.08656764e-16, 9.23705686e-14])
vector_bytes = vector_np.tobytes()
vector_bytes
b'/:l=\x00@\x12>\x00\x00\xcd\xbcFW\xea=tq\x08\xbe\xd1\xc5\xa7\xbc\xa3\xbb:<\x8c\xae\xe1<\x00x\xc3=F7\x1d>t\xd1\xb5;\xa3q\n\xbe\xe9\xa2[\xbc]\xf4\xe0=\xbah]\xbe\xa3\x8bY=tq\x8e=\xa3KC>t\xd1F\xbd\xba(C\xbd/"\x04>\x17]j\xbdF\xc7\x11\xbd\x17]G=\x17\xfd\x06=F\x17\x1f\xbc\xba\x90\xa9\xbd\x17\x1d\x0b>\xa3\x8b\xeb<\x17\xb5\xed\xbd\x00\x00\x0f=/Z\xe1\xbd\x00\xa0\xd9=\xd1E\xcf:]t\x1c\xbdFWv\xbd]\xe4\x90=/\xbab\xbc]T\t\xbd\xa3\x8b\xb8=\x8c.\xff=\x17=\xce\xbdF\x17\xbf=t\x91E\xbdt\x11\xa2\xbcFW\x8b\xbdt\xd1]<]\xb4\xef=t\xd1_=F\x17\xb9\xba\xd1\xc5F\xbd\x8cNR=\xa3\x0b\xaf\xbct\xf1\x07\xbdt\xd1\x0e=\x8c\x0e\x95=\xd1EK<\xd1\x05\xfc\xbd/:\x9a<\x17\xddn\xbdF\x93\xc3=\xbah\x1e>]<\x03\xbe\xa3\x8b\x9a\xbcF\x17\x0b\xbd\xa3\x8b:<]\xf4\xf9\xbc]t\x01>/\xba)\xbe]d\xcb=\xa3\x8b4=]4\xa3=\xe9"\xe4=\x00\xae\x9b=\xba\xe83\xbe\xd1EF\xbd\x8c.\r=\xa3\xcb\x0b>\x17uv=\xba\xe8\xa1=]\xb4\x06=F\x87\xb2\xbd\xba(\x9c=\xd1Ew\xbbt\xd1\xd7\xbb/\x1a\x08\xbeF\x17\n\xbd\xba\x08\x03>\xa3+\xc6=\xe9\xa2\x07=\x17]M\xbd]\xd4\xa0=\xd1\x05\xf1\xbc\xe9\xe2\xd7\xbd\x17\xdd\xcb\xbc/:c\xbd]\xf4\x0f>t\xd1p<\x8c\xae!=\xa3\xcbO=\xba\xb8M\xbd\x17]$\xbdt\xd1\x82=\xe9\xa2\x03=\xe9"W\xbcF\xa7\xe8\xbd\xe9\xa2:=\x00@_=\xd1E{;\xa3\x0b\xd0\xbd/\xba\x88\xbb/\xbad<\xbah\x80<tQ\x95\xbd\xba\xc8\xd3=t\xd1\xe4;]\xf4/=\x8cn\xe7<\x17}\xc8\xbc/:\xa1=t\x15\xb7\xbd\x8c.\xae\xbb\x00\x80E\xbd\x17]z\xbc\xa3\x8b.\xba\xe9\xa2\x1d<\xd1E*\xbd\xe9Br<\xd1E\xe0<F\x97\xa7<\xba\xe8b\xbd\xe9\x82\xfe\xbd\xe9b\xb2\xbd/\xba\x94\xbc]\xd4\x8b\xbd\xd1\xc5X=F\xd7\x9b=t\xf1\x99=\x8c.\xe4\xbc\xe9\xc2r=t\x91\xb9=\x004\x08\xbe\xba8m\xbd\x8c.\x82;\xa3\xab\x84\xbd\x8c.\xfa\xb9\xba`0\xbe]t\x93\xbctQ\x84=\xe9\xa2\xf1;\x17\xfd\xb2=\xa3K\x05\xbd\xd1\xc5M\xbe\x00\xf0\x9f=t\x81\xa2\xbd\xa3\x0b\x81<t\xf1\xd4<F\xb7\xf9\xbdF\x97\x0b<\x00\x00\xb8;\xe9\xa2R\xbd\x17\xdd\xbd=\x17]/=\xa3K1=\x17]h\xbc\xba\xa8\xd5\xbd\xbaXC\xbdt\xd1b<\x00`\x11\xbdF\xd7\xbb\xbd/\xfa\x0b\xbe\xa3\x8bv;\x00\x00\r<\xd1\xe5\xf8=/:\xeb\xbcF\x17\xfc\xbc\xa3\x8b,>tQ\xa7\xbd\xd1E\x8c;/:\xb5\xbc\x8c.\xa2\xbb\x8c.\xea\xbd\x17\x9d\x81=\xbahr</\xba\x9f\xbd]t\xa7\xbc\x17\x89\x1e\xbe\x008\x88\xbd\xa3\x0b\xad=\xe9"\x9c<t\xd1U;\xe9\xa2\xa1\xbc\x17]\xa1=\xe9\xa2\x8b8\xe9\x12\x18\xbe/\xba =\xe9\xa2\x96\xbd\xd1\xe5(=\xd1\xb5\xcb\xbd\xa3\xdb\xdf\xbd\xd1%\xc7=]\xa4\xe3\xbc\x17]\xa3\xbd/\x8a\xa3\xbdta\x1f>\x00\x00\xcc\xbb\x00\x00o\xbd\xd1\xa5%<\x17]<\xbc\xe9"\xd4\xbc]\x9c9\xbe\x8c\xee\x9c=\xd1\x05\x81\xbd\xa3+\xb0=\xa3\xebH=\x00`P=]\xf4`=tQ\xf7<\xba\xe8x\xbd\x17]\x1e\xbc]\xf4\x90\xbct1\xde=]\xd4/\xbd\x17\x1d\xde\xbdF\x97w\xbc\xba(\x86\xbcF\xd7\xb3<\xe9b\xfb\xbd\xd1EW=\xa3\x8b2;/\x9a\xcc=\x8c.\xab\xbd\xba(\x11=\x8c$x=F\xd7\xa5=\x8c\xde\xde<F\x17\x13=\x8c\xd6\xce\xbd\xba\x98\x19\xbd]\xb4\x11=/\x9aC>tq\xc3=\x00\xe0\xd1\xbd\xd1\x7f,\xbe\x00\x80\x12=\xa3K-</\xbaj<\xd1Es=]\xf4\xf9;F\x97\xb2<\x00\x00p;\x00@\xd3\xbd\x8cV<=\xa3\x0b\xae=]ti<\xba\xc8\xb0\xbd]tF\xbc\x17\xf6\x9b=\x8c.e\xbd\xe9\xa2\xf1<\x00\xe0e=\x8c\xae\x80=\xa3\x8b\x80\xbd\x00\xb0\x90=\xa3K\xac<]\xf4\x8d<\xa3\x0b\xe3\xbd\x00\x00D</\xba\x95\xbct\xd1e\xbb\x8c\xae\xc1;/\xba\x08>]\xf4\xfa=/\x1a\xd8\xbd]tG\xbc\x17=\xd2\xbdF\xd7\xa1\xbd\xba\xe82<\xe9\xe24\xbd\xe9\xa2\x17=]\xf4\xbe=\x00\x00\xd8;\xe9.\xa1=\x00\x00\x8e<F\x17u\xbd]t\xe9\xbcF\xf7\x84\xbd\x17\x1d\x97<\x8c.\x9d=\x8cN\x85\xbd\xd1\xa5\x80\xbd]\xb4\x91=\xd1E\xb6\xbc\x8c\x8e\x11=]\xf4\x14\xbd\xba\x88"\xbc\x8c\x81\xb7=tQ\x9f<]d\xf3<\x00\x00:='
vector_string = str(vector_bytes)
np.frombuffer(bytearray(vector_string,'utf-8'),dtype=np.float64)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-242-743df46933d0> in <module>
----> 1 np.frombuffer(bytearray(str(example),'utf-8'),dtype=np.float64)
ValueError: buffer size must be a multiple of element size
Upvotes: 5
Views: 18624
Reputation: 2367
The problem is that when you convert numpy
bytes to str
it adds up the escape characters (i.e., \
) to every \
, which results in \\
instead of \
(e.g., \x00
turns into \\x00
etc.). This messes up the decoding of the string back to the numpy
bytes object.
In addition, the str()
function adds the b\
and the '
to the string, which are then being also encoded as bytes.
The fix is to get rid of all the added characters, (i.e., the extra \
and of the first two b'
and the last '
). The b'
and the last '
characters are easily removed by the [2:-1] indexing operation. Then the 'ISO-8859-1'
will remove all the redundant \
s and will bring it to the original form (i.e., the form of the vector_bytes
)
Here is the solution:
vector_bytes = vector_np.tobytes()
vector_bytes_str = str(vector_bytes)
vector_bytes_str_enc = vector_bytes_str.encode()
bytes_np_dec = vector_bytes_str_enc.decode('unicode-escape').encode('ISO-8859-1')[2:-1]
np.frombuffer(bytes_np_dec, dtype=np.float64)
Cheers.
Upvotes: 10