Ron
Ron

Reputation: 2503

Can textbox.text encoding be ignored?

I have code that reads data from a textbox.text control into a byte array. It uses UTF8 encoding and there has not been any issues. The code reads, say, M number of bytes from the textbox, and adds it to output, as bytes. That all works fine.

When the data is written back, if the text is Non-English language, there are often problems. For instance if the text is the Chinese char 南 say repeated a few times, which seems to be, for the text box, 0xE5, 0x8D, 0x97.

When the data is written back to the text box, if say, the first write ended on 0xE5, when the next batch of data is written back starting with 0x8D 0x97, it is transformed somehow to 0xEF 0xBF 0xBD.

enter image description here

I'm just using Array.Copy. Nothing special. With English, no problem. With Chinese (and Japanese as well), the first write goes OK but the second write has some of these "corrupted" chars.

Upvotes: 0

Views: 2491

Answers (2)

Ron
Ron

Reputation: 2503

First, thanks for that information. I only used Chinese as an example. The code will not know the language and should not care. It could be Hindi or Japanese. Your conversion byte[] to string is what I use.

After I posted the question I realized that the code seems to correctly handle data, just not writing back to the Textbox text control. I'm not sure what the control is doing, perhaps it "detects" the language or detects it's not UTF8 and tries some kind of encoding.

BUT in any case I deferred writing the bytes back into the text box until the end and that seems to work just fine. That is to say, I keep adding the bytes back into an array using Array.Copy(...) and at the end write the whole thing back into the text box using UTF8, as you mentioned.

Upvotes: 0

Ashkan Mobayen Khiabani
Ashkan Mobayen Khiabani

Reputation: 34180

The problem mus t not be related to reading from/writing to textbox. The problem is how you convert text to byte and back. you have not provided any code, so my code must not be exactly what you want but for converting UTF-8 string to bytes you can do:

byte[] bytes = System.Text.Encoding.UTF8.GetBytes(textBox1.Text);

To convert byte[] to string:

textbox1.Text = System.Text.Encoding.UTF8.GetString(bytes);

If you Ignore Encoding and just use ascii encoding, it will lead to loss of data when converting to byte.

There is also a question related to converting Chinese to byte[]: How to encode and decode Broken Chinese/Unicode characters?

Upvotes: 1

Related Questions