user1010399
user1010399

Reputation: 2348

how utf-8 identifies the different language character

I am really amazed to see the magic of utf-8 but couldn't understand the logic behind it. I went through several documents but still confused though i know the basic only.

please take a look first example. it converts from language character to utf-8. there are two text box, in first text box enter the chars, click the button and get the utf-8 values in second text box as utf-8.

please take a look of the second example . i have used the utf-8 char from the example 1 and put the value in html and here i really do not understand how it translates. as i tested three language chinese, Hindi and Russian.

used google translator to translate from english to several language
Hello = 您好(chinese)

Hello = नमस्ते (Hindi)

Hello = привет (Russian) 

how does a web page identify the language character on the basis of utf-8 ? is it possible that different computer will show different character ?

Upvotes: 1

Views: 2286

Answers (2)

Joop Eggen
Joop Eggen

Reputation: 109567

UTF-8 is a variable-length byte encoding of Unicode, the character numbering system for all languages.

Internet web pages by default base on ISO-8859-1, so called Latin-1. Other charsets can be set by:

  1. Header lines of text, preceding an empty line and then the HTML content text. There a header line:

    Content-Type: text/html; charset=UTF-8
    

    A Java EE server needs to do for this:

     response.setContentType("text/html; charset=UTF-8");
    
  2. In the HTML head a meta tag

    <html>
      <head>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    ...
    

Upvotes: 1

Oded
Oded

Reputation: 499062

The "magic" behind UTF-8 is called Unicode. It is one of several encodings of the standard.

Unicode does have character ranges that correspond to languages and many characters are specifically associated with a language.

I suggest reading this - The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).

Upvotes: 2

Related Questions