emilly
emilly

Reputation: 10530

encoding for unicode character?

I get the document from third party which i display on browser with charset utf-8

  Content-Type: text/html; charset=utf-8

But some of the characters are displayed as junk. My understanding is even if they are sending unicode character , utf-8 encoding is appropriate. Should i change the encoding to something else or is it the issue at sending side . Sending party using ANSI/ASCII encoding. I believe they should use utf 8 as ANSI/ASCII is not appropriate for unicode character. Is that correct ?

Upvotes: 2

Views: 299

Answers (1)

Jesper
Jesper

Reputation: 206816

Computers can ultimately only handle ones and zeroes (numbers). To represent text in a computer, you need to map numbers to characters. That is exactly what a character encoding is for.

For example, the ASCII character encoding specifies that 65 = A, 66 = B, etc.

There are many different character encodings. ASCII is a very old and limited character encoding, that only has room for 127 characters.

UTF-8 is a different character encoding that can encode all characters in the Unicode standard, which encompasses many thousands of characters.

If your HTML page specifies that the text on the page is encoded using UTF-8, but in reality it uses a different encoding, then you see garbage on the screen - you told the browser that it's UTF-8 but it's really not so it's going to interpret the page the wrong way. If you get this, then it is almost certainly an issue on the sending side - the sending side must make sure that it indeed encodes the text using UTF-8 if that's what it says in the HTML header.

UTF-8 is appropriate for any kind of text, in my opinion this should be your default choice of character encoding; only use something else if you have a good reason to do so.

UTF-8 is compatible with ASCII (ASCII is a subset of UTF-8) - if the sending side is really sending ASCII-encoded text, you should have no problem displaying it using UTF-8. If you get strange characters, then the sending side is most likely not really using ASCII.

Upvotes: 1

Related Questions