RedEagle
RedEagle

Reputation: 4570

Character encoding

For this piece of code:

String content = String.Empty;
ListenerStateObject state = (ListenerStateObject)ar.AsyncState;
Socket handler = state.workSocket;

int bytesRead = handler.EndReceive(ar);

if (bytesRead > 0)
{
   state.sb.Append(Encoding.UTF8.GetString(state.buffer, 0, bytesRead));

   content = state.sb.ToString();
   ...

I'm geting 'Ol?' instead of 'Olá'

What's wrong with it?

Upvotes: 9

Views: 284

Answers (3)

JacquesB
JacquesB

Reputation: 42689

Are you sure that the stream is actually utf-8 encoded? Try inspecting the raw bytes in the buffer before encoding (there should be 4) and see what the actual byte values are.

Upvotes: 1

Henk Holterman
Henk Holterman

Reputation: 273844

Most likely it's the wrong encoding.

But if you use this code to receive blocks of bytes (split by a protocol) you will have a serious flaw: there is no guarantee that the block were independently encoded.

Simple case: the boundary of 2 blocks cuts through a multi-byte encoded char.

Best solution: Attach a TextReader to your Stream.

Upvotes: 4

CodingBarfield
CodingBarfield

Reputation: 3398

Are you outputting the result into something that understands 'complex' encoding?

Upvotes: -1

Related Questions