Reputation: 75
I've been using the following code to get the UDP data stream from a sender in my network:
import socket
import datetime
## Configs
UDP_IP = "169.254.67.186"
UDP_PORT = 5606 #PC1 uses 5606
## Creating socket object
sock = socket.socket(socket.AF_INET,
socket.SOCK_DGRAM) #AF_INET specifies that IPs are going to be used. #DGRAM specifies that it is going to be under UDP
address = (UDP_IP, # IP Address
UDP_PORT) # Port of that IP
sock.bind(address)
## Progrma startup message
timestamp = datetime.datetime.now().time()
print("Initiating data print at:",timestamp)
print("-------------------------------------------")
i = 1
## Initiates loop to 'listen'
while i < 10:
# Function to recieve data
data,senderaddr = sock.recvfrom(10240000) # Argument is the buffer size (maximum size of data being received at once). # Two outputs are given, the "data" output and the "address from senders" output.
print("Streaming:",data)
i = i + 1
It is still in testing period, so that is why I'm only receiving 10 packets of data and ending the while loop. Anyway, the start of the variable 'data' is, currently:
Start of 'data' variable values
This is basically the format I'm getting from the stream. Looking around I have learned this is a literal byte variable of Python 3 (I'm using P3), and there are a few ways of decoding it to useful strings which didn't work, such as
Method 1:
str(data, 'utf-8')
Traceback (most recent call last): File "", line 1, in UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd3 in position 0: invalid continuation byte
Method 2:
import binascii
data.decode("utf-8")
Traceback (most recent call last): File "", line 1, in UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd3 in position 0: invalid continuation byte
None of these really helped me. I noticed this format of b"\xx0\x00\xx0\x00\x00\x00 ... is not the common example used for literal bytes conversion. In the threads I found people are using more the format b"abcdef" to describe their problem (without backslashes, that seem to separate individual characters), so I guess I might be missing something here. It makes sense to think the methods I'm trying to use are not the right ones because of this part of the error messages:
codec can't decode byte 0xd3 in position 0: invalid continuation byte
So, could you guys help me telling me what I'm missing here?
Thank you
Upvotes: 2
Views: 899
Reputation: 5833
That's because your data
bytearray does not represent UTF-8
encoded data.
It makes sense that \xd3
followed by a \x04
is not a valid UTF-8
sequence because according to the UTf-8 specification (Wikipedia link), values between 0x80
and 0x7FF
are to be represented by two bytes of the format 110x xxxx
and 10xx xxxx
, but more specifically because we are talking about the Unicode Standard, it means that anywhere from \xc2
to \xdf
for the first byte and \x80
to \xbf
for the second byte, hence a \xd3
that is not preceeded by anything between \x80
and \xbf
is not a valid unicode formation.
Demonstration:
b'\xd3\x80'.decode('utf-8')
=> I
(which is the CYRILLIC LETTER PALOCHKA U+04C0)
if we go one lower:
b'\xd3\x79'.decode('utf-8')
it will throw an UnicodeDecodeError
as to why \x79
is 0111 1001
in binary, and not matching the 1xxx xxxx
pattern established for UTF-8
.
Upvotes: 1