Kevin
Kevin

Reputation: 28

Python Socket Data looks like unicode but cannot be translated

I'm new to Python CGI , wanted to translate a Minecraft MOTD Script from my php project,which is using a Socket to get the data from the Server

Here's my source code:

 s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    host = "example.com"
    port = 25565
    s.connect((host, port))
    s.sendall(b"\xFE\x01", 0)
    msg = s.recv(4096)
    s.close
    print(msg)

It can connect to the server and send back the MOTD data,but it looks weird

b'\xff\x00>\x00\xa7\x001\x00\x00\x001\x002\x007\x00\x00\x00B\x00u\x00n\x00g\x00e\x00e\x00C\x00o\x00r\x00d\x00 \x001\x00.\x008\x00.\x00x\x00-\x001\x00.\x001\x002\x00.\x00x\x00\x00\x00\xa7\x00f\x00\xa7\x001\x00A\x00n\x00o\x00t\x00h\x00e\x00r\x00 \x00B\x00u\x00n\x00g\x00e\x00e\x00 \x00s\x00e\x00r\x00v\x00e\x00r\x00\x00\x006\x00\x00\x002\x000\x000' --> -->

I've tried to find what it is , it looks like UTF-16 and i've tried this to solve the problem:

msg.decode('UTF-16')

but sadly , it didnt work

UnicodeDecodeError: 'utf-16-le' codec can't decode byte 0x30 in position 126: truncated data 
      args = ('utf-16-le', b'\xff\x00>\x00\xa7\x001\x00\x00\x001\x002\x007\...0v\x00e\x00r\x00\x00\x006\x00\x00\x002\x000\x000', 126, 127, 'truncated data') 
      encoding = 'utf-16-le' 
      end = 127 
      object = b'\xff\x00>\x00\xa7\x001\x00\x00\x001\x002\x007\...0v\x00e\x00r\x00\x00\x006\x00\x00\x002\x000\x000' 
      reason = 'truncated data' 
      start = 126 
      with_traceback = <built-in method with_traceback of UnicodeDecodeError object>

Python couldnt translate these code to text , which confused me a lot. I'm new in programming , is there any solution to solve this problem?

Upvotes: 0

Views: 432

Answers (2)

Mark Tolonen
Mark Tolonen

Reputation: 178179

According to http://wiki.vg/Server_List_Ping#1.6, the first three bytes are a packet ID and size of the following UTF-16BE string:

Server to client

The server responds with a 0xFF kick packet. The packet begins with a single byte identifier ff, then a two-byte big endian short giving the length of the following string in characters. You can actually ignore the length because the server closes the connection after the response is sent.

After the first 3 bytes, the packet is a UTF-16BE string. It begins with two characters: §1, followed by a null character. On the wire these look like 00 a7 00 31 00 00.

The remainder is null character (that is 00 00) delimited fields:

  1. Protocol version (e.g. 74)
  2. Minecraft server version (e.g. 1.8.7)
  3. Message of the day (e.g. A Minecraft Server)
  4. Current player count
  5. Max players

So strip the first three bytes, then decode:

>>> data = b'\xff\x00>\x00\xa7\x001\x00\x00\x001\x002\x007\x00\x00\x00B\x00u\x00n\x00g\x00e\x00e\x00C\x00o\x00r\x00d\x00 \x001\x00.\x008\x00.\x00x\x00-\x001\x00.\x001\x002\x00.\x00x\x00\x00\x00\xa7\x00f\x00\xa7\x001\x00A\x00n\x00o\x00t\x00h\x00e\x00r\x00 \x00B\x00u\x00n\x00g\x00e\x00e\x00 \x00s\x00e\x00r\x00v\x00e\x00r\x00\x00\x006\x00\x00\x002\x000\x000'
>>> data[3:].decode('utf-16be').split('\x00')
['§1', '127', 'BungeeCord 1.8.x-1.12.x', '§f§1Another Bungee server', '6', '200']

Upvotes: 0

Nepho
Nepho

Reputation: 1112

You're getting a truncated data because your data is, indeed, truncated. It has a length of 127 bytes, decoding it from UTF-16 requires either 126 or 128 bytes of data (an even number of bytes, that is).

Removing the trailing \x00 bytes and decoding gives the following:

>>> a = b'\xff\x00>\x00\xa7\x001\x00\x00\x001\x002\x007\x00\x00\x00B\x00u\x00n\x00g\x00e\x00e\x00C\x00o\x00r\x00d\x00 \x001\x00.\x008\x00.\x00x\x00-\x001\x00.\x001\x002\x00.\x00x\x00\x00\x00\xa7\x00f\x00\xa7\x001\x00A\x00n\x00o\x00t\x00h\x00e\x00r\x00 \x00B\x00u\x00n\x00g\x00e\x00e\x00 \x00s\x00e\x00r\x00v\x00e\x00r\x00\x00\x006\x00\x00\x002\x00'
>>> a.decode("utf-16")
u'\xff>\xa71\x00127\x00BungeeCord 1.8.x-1.12.x\x00\xa7f\xa71Another Bungee server\x006\x002'
>>> 

Upvotes: 1

Related Questions