Reputation: 2348
I am really amazed to see the magic of utf-8 but couldn't understand the logic behind it. I went through several documents but still confused though i know the basic only.
please take a look first example. it converts from language character to utf-8
. there are two text box, in first text box enter the chars, click the button and get the utf-8
values in second text box as utf-8
.
please take a look of the second example . i have used the utf-8 char from the example 1 and put the value in html
and here i really do not understand how it translates. as i tested three language chinese, Hindi and Russian
.
used google translator to translate from english to several language
Hello = 您好(chinese)
Hello = नमस्ते (Hindi)
Hello = привет (Russian)
how does a web page identify the language character on the basis of utf-8
? is it possible that different computer will show different character ?
Upvotes: 1
Views: 2286
Reputation: 109567
UTF-8 is a variable-length byte encoding of Unicode, the character numbering system for all languages.
Internet web pages by default base on ISO-8859-1, so called Latin-1. Other charsets can be set by:
Header lines of text, preceding an empty line and then the HTML content text. There a header line:
Content-Type: text/html; charset=UTF-8
A Java EE server needs to do for this:
response.setContentType("text/html; charset=UTF-8");
In the HTML head a meta tag
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
...
Upvotes: 1
Reputation: 499062
The "magic" behind UTF-8 is called Unicode. It is one of several encodings of the standard.
Unicode does have character ranges that correspond to languages and many characters are specifically associated with a language.
I suggest reading this - The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).
Upvotes: 2