Alexandre Piccini
Alexandre Piccini

Reputation: 75

How to use UDP-acquired literal byte data

I've been using the following code to get the UDP data stream from a sender in my network:

import socket
import datetime

## Configs
UDP_IP = "169.254.67.186"
UDP_PORT = 5606 #PC1 uses 5606

## Creating socket object
sock = socket.socket(socket.AF_INET,
                     socket.SOCK_DGRAM)  #AF_INET specifies that IPs are going to be used. #DGRAM specifies that it is going to be under UDP
address = (UDP_IP,      # IP Address
           UDP_PORT)    # Port of that IP
sock.bind(address)

## Progrma startup message
timestamp = datetime.datetime.now().time()
print("Initiating data print at:",timestamp)
print("-------------------------------------------")

i = 1

## Initiates loop to 'listen'
while i < 10:
    # Function to recieve data
    data,senderaddr = sock.recvfrom(10240000) # Argument is the buffer size (maximum size of data being received at once). # Two outputs are given, the "data" output and the "address from senders" output.
    print("Streaming:",data)
    i = i + 1

It is still in testing period, so that is why I'm only receiving 10 packets of data and ending the while loop. Anyway, the start of the variable 'data' is, currently:

Start of 'data' variable values

This is basically the format I'm getting from the stream. Looking around I have learned this is a literal byte variable of Python 3 (I'm using P3), and there are a few ways of decoding it to useful strings which didn't work, such as

Method 1:

str(data, 'utf-8')

Traceback (most recent call last): File "", line 1, in UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd3 in position 0: invalid continuation byte

Method 2:

import binascii
data.decode("utf-8")

Traceback (most recent call last): File "", line 1, in UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd3 in position 0: invalid continuation byte

None of these really helped me. I noticed this format of b"\xx0\x00\xx0\x00\x00\x00 ... is not the common example used for literal bytes conversion. In the threads I found people are using more the format b"abcdef" to describe their problem (without backslashes, that seem to separate individual characters), so I guess I might be missing something here. It makes sense to think the methods I'm trying to use are not the right ones because of this part of the error messages:

codec can't decode byte 0xd3 in position 0: invalid continuation byte

So, could you guys help me telling me what I'm missing here?

Thank you

Upvotes: 2

Views: 899

Answers (1)

andreihondrari
andreihondrari

Reputation: 5833

That's because your data bytearray does not represent UTF-8 encoded data.

It makes sense that \xd3 followed by a \x04 is not a valid UTF-8 sequence because according to the UTf-8 specification (Wikipedia link), values between 0x80 and 0x7FF are to be represented by two bytes of the format 110x xxxx and 10xx xxxx, but more specifically because we are talking about the Unicode Standard, it means that anywhere from \xc2 to \xdf for the first byte and \x80 to \xbf for the second byte, hence a \xd3 that is not preceeded by anything between \x80 and \xbf is not a valid unicode formation.

Demonstration:

b'\xd3\x80'.decode('utf-8') => I (which is the CYRILLIC LETTER PALOCHKA U+04C0)

if we go one lower: b'\xd3\x79'.decode('utf-8') it will throw an UnicodeDecodeError as to why \x79 is 0111 1001 in binary, and not matching the 1xxx xxxx pattern established for UTF-8.

Upvotes: 1

Related Questions