Reputation: 1
I have a socket listener in C# .NET which listens for connections. The connections are mainly from RUSSIAN or CHINESE clients which could send to the server data with non-latin chars. How to determine the right encoder for the socket incoming data. I used this code but with other data than latin chars seems to return only ????? chars.
byte [] buffer = new byte[1024];
int iRx = m_socWorker.Receive (buffer);
char[] chars = new char[iRx];
System.Text.Decoder d = System.Text.Encoding.UTF8.GetDecoder();
int charLen = d.GetChars(buffer, 0, iRx, chars, 0);
System.String szData = new System.String(chars);
txtDataRx.Text = szData;
Upvotes: 0
Views: 310
Reputation: 26446
An encoding is an agreement on how to write characters as a series of bytes. You cannot look at a series of bytes and determine which encoding was used to create them.
Your code currently uses UTF8 to decode the data - UTF8 is capable of storing Russian and Chinese characters, but you must ensure that the clients encode their data as UTF8 as well.
Furthermore to display these characters you have to have a font that supports the languages (Russian is quite common, as it contains a fixed number of characters in their alphabet, while in Chinese there are thousands of different "letters" and you might need to download a font to make them appear on your screen). Usually though unsupported characters are displayed as empty squares, while question marks are generated when characters are used that are not supported by the encoding.
Joel has created an article with more (basic) information about encoding you might want to read.
Upvotes: 1