Reputation: 303
I am trying to make a very simplistic chat program with a server made in python and the client in java. However I have no idea how to decode the data which the server receives from the client. The client sends and encodes to UTF-8.
Just printing it looks like this: https://i.sstatic.net/qAxHL.jpg
And decoding from UTF-8 first it looks like this: https://i.sstatic.net/oXMph.jpg
I assume that the NUL character or \x00 can be removed. the same going for the b'' which wraps the entire message. The second character seems to specify the length of the message. But how do I decode this? Should I just remove characters manually? I know this is quite a basic question and has probably been asked before but I don't even know what to search for.
Upvotes: 3
Views: 2732
Reputation: 22261
In the java client I have a DataOutputStream object which i use with this method: out.writeUTF(input);
According to the documentation of that method, it doesn't write UTF-8 to the output stream. It says "First, two bytes are written to the output stream", which explains your 16-bit lengths that precede the strings. And even after that it doesn't write UTF-8, it writes in Java's own idiosyncratic encoding which it calls Modified UTF-8 and which is a actually variant of CESU-8, not UTF-8.
So first of all, you need to clarify what format exactly you wish to use to communicate between the client and server: the protocol. Is it plain UTF-8? Is it the bizarre structured encoding that writeUTF
emits? Is it something else? Then write both your client and server to follow that specification.
Upvotes: 3