Reputation: 113
This was a very strange situation, the split function is changing string format. Please look at the below code,
Code:
COM_Port = serial.Serial(COM_PortName)
with COM_Port as port:
while True:
RxedData = port.readline()
line = RxedData.decode('utf-8')
print("Line 1: ", line)
row = line.split(',')[1:-1]
print("Line 2: ", row)
Output:
Line 1: "* , 0 0 0 0 0 5 7 5 , 2 3 : 0 3 : 4 7 , 1 1 / 0 2 / 2 0 , 1 2 . 3 4 5 , K P A , 0 0 0 0 6 . 8 3 , S L P M , T B , , $ "
Line 2: ['\x000\x000\x000\x000\x000\x006\x002\x001\x00', '\x002\x000\x00:\x004\x006\x00:\x005\x001\x00', '\x001\x002\x00/\x000\x002\x00/\x002\x000\x00', '\x001\x002\x00.\x003\x004\x005\x00', '\x00K\x00P\x00A\x00', '\x000\x000\x000\x000\x000\x00.\x000\x000\x00', '\x00C\x00C\x00P\x00M\x00', '\x00T\x00G\x00', '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00']
How the Line 2
, get into \x000\x000...
? What is this encoding format? How to get it into the right format?
Edit 1:
print([hex(i) for i in RxedData])
Output:
['0x2a', '0x0', '0x2c', '0x0', '0x30', '0x0', '0x30', '0x0', '0x30', '0x0', '0x30', '0x0', '0x30', '0x0', '0x30', '0x0', '0x30', '0x0', '0x31', '0x0', '0x2c', '0x0', '0x31', '0x0', '0x31', '0x0', '0x3a', '0x0', '0x35', '0x0', '0x31', '0x0', '0x3a', '0x0', '0x35', '0x0', '0x30', '0x0', '0x2c', '0x0', '0x31', '0x0', '0x33', '0x0', '0x2f', '0x0', '0x30', '0x0', '0x32', '0x0', '0x2f', '0x0', '0x32', '0x0', '0x30', '0x0', '0x2c', '0x0', '0x31', '0x0', '0x32', '0x0', '0x2e', '0x0', '0x33', '0x0', '0x34', '0x0', '0x35', '0x0', '0x2c', '0x0', '0x4b', '0x0', '0x50', '0x0', '0x41', '0x0', '0x2c', '0x0', '0x31', '0x0', '0x32', '0x0', '0x33', '0x0', '0x34', '0x0', '0x35', '0x0', '0x2e', '0x0', '0x36', '0x0', '0x36', '0x0', '0x2c', '0x0', '0x53', '0x0', '0x4c', '0x0', '0x50', '0x0', '0x48', '0x0', '0x2c', '0x0', '0x0', '0x0', '0x0', '0x0', '0x2c', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2c', '0x0', '0x24', '0x0', '0xa']
Upvotes: 1
Views: 10884
Reputation: 148965
Ok, from the hexdump of the recieved bytes, it appears that each ASCII character is followed with a NULL byte (\x00
). That is just the UTF-16-LE representation of the characters. The UTF-8 decode just keeps the code points of the initial bytes because all are below 128, leaving all the interleaving nulls. And you cannot simply decode the byte string as UTF-16 (what it is indeed) because you got it through a readline
which just stopped after the newline character and has not read the following null one.
If you could read another line, it would probably start with that null character, making the line appearing as UTF-16-BE encoded...
What can be done then?
A trivial workaround is just to get rid of the null characters. If you can be sure that you will only get plain ASCII characters (no accented ones like é
, no emoticons, no greek or cyrillic ones, etc.), this would be enough:
RxedData = port.readline()
line = RxedData.replace(b'\x00', b'').decode('ascii')
print("Line 1: ", line)
row = line.split(',')[1:-1]
print("Line 2: ", row)
With that values: ['0x2a', '0x0', '0x2c', '0x0', '0x30', '0x0', '0x30', '0x0', '0x30', '0x0', '0x30', '0x0', '0x30', '0x0', '0x30', '0x0', '0x30', '0x0', '0x31', '0x0', '0x2c', '0x0', '0x31', '0x0', '0x31', '0x0', '0x3a', '0x0', '0x35', '0x0', '0x31', '0x0', '0x3a', '0x0', '0x35', '0x0', '0x30', '0x0', '0x2c', '0x0', '0x31', '0x0', '0x33', '0x0', '0x2f', '0x0', '0x30', '0x0', '0x32', '0x0', '0x2f', '0x0', '0x32', '0x0', '0x30', '0x0', '0x2c', '0x0', '0x31', '0x0', '0x32', '0x0', '0x2e', '0x0', '0x33', '0x0', '0x34', '0x0', '0x35', '0x0', '0x2c', '0x0', '0x4b', '0x0', '0x50', '0x0', '0x41', '0x0', '0x2c', '0x0', '0x31', '0x0', '0x32', '0x0', '0x33', '0x0', '0x34', '0x0', '0x35', '0x0', '0x2e', '0x0', '0x36', '0x0', '0x36', '0x0', '0x2c', '0x0', '0x53', '0x0', '0x4c', '0x0', '0x50', '0x0', '0x48', '0x0', '0x2c', '0x0', '0x0', '0x0', '0x0', '0x0', '0x2c', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2d', '0x0', '0x2c', '0x0', '0x24', '0x0', '0xa']
, you should obtain:
Line 1: *,00000001,11:51:50,13/02/20,12.345,KPA,12345.66,SLPH,,--------,$
Line 2: ['00000001', '11:51:50', '13/02/20', '12.345', 'KPA', '12345.66', 'SLPH', '', '--------']
The good point with it is that is simple and robust provided you only have plain ASCII
The encoding conformant wait would be to use a TextIOWrapper around the serial port, and specify the UTF-16-LE encoding in it. I could not test it (no Serial on my box and no need for it) so only guessing what should be done.
COM_Port = serial.Serial(COM_PortName)
with io.TextIOWrapper(io.BufferedRWPair(COM_Port, COM_Port), encoding = 'utf-16-le') as port:
while True:
line = port.readline()
print("Line 1: ", line)
row = line.split(',')[1:-1]
print("Line 2: ", row)
Here, the TextIOWrapper will take care of the null byte following the newline byte, and will give you directly true unicode strings.
Upvotes: 5
Reputation: 1212
Decided to turn my comment into an answer (mostly so I could include code).
The two lines are printing differently because you're printing different things. On
Line 1
, you're printing a string directly, so theLine 2
, you're now printing a list, so the byte interpretation doesn't happen during the
Your line
string, after the decode, very likely has \x00
(NULL) bytes embedded in it instead of ASCII spaces (\x20
).
>>> x = '*\x00,\x000\x000\x000\x000\x000\x005\x007\x005\x00'
>>> print(x)
'* , 0 0 0 0 0 5 7 5 ,'
>>> print(x.split(','))
['*\x00', '\x000\x000\x000\x000\x000\x005\x007\x005\x00']
To amend my quoted comment, this appears to be based on whatever console is printing the characters. I get the above output from cmd and PowerShell, but Jupyter Notebook instead prints this: *,00000575
. Note the "spaces" are now gone.
If I change a few of the \x00
bytes to \x20
instead, Jupyter will then print what you're seeing above (in the positions where they were replaced at least). This is just to show that NULL characters and Space characters can visually look identical, depending on the console displaying them.
>>> x = '*\x20,\x200\x200\x000\x000\x000\x005\x007\x005\x00'
>>> print(x)
* , 0 0000575
Edit for your comment:
How to make it interprets correctly?
It depends what "correctly" means to you. In essence, everything has been interpreted correctly - your serial port is just sending across NULL bytes instead of space characters.
If you would rather have ASCII spaces instead of NULL bytes though, you can do a simple replace on the string (printed from Jupyter, which displays NULLs as nothing). You can also just use ' '
instead of '\x20'
if you prefer.
>>> print(x.replace('\x00', '\x20').split(','))
['* ', ' 0 0 0 0 0 5 7 5 ']
Upvotes: 3