Reputation: 1835
What exactly are unicode character codes? And how are they different from ascii characters?
Upvotes: 35
Views: 47658
Reputation: 11536
The first 128 Unicode code points are the same as ASCII. Then they have a 100,000 or so more.
There are two common formats for Unicode, UTF-8 which uses 1-4 bytes for each value (so for the first 128 characters, UTF-8 is exactly the same as ASCII) and UTF-16, which uses 2 or 4 bytes.
Upvotes: 14
Reputation: 837916
Unicode is a way to assign unique numbers (called code points) to characters from nearly all languages in active use today, plus many other characters such as mathematical symbols. There are many ways to encode Unicode strings as bytes, such as UTF-8 and UTF-16.
ASCII assigns values only to 128 characters (a-z, A-Z, 0-9, space, some punctuation, and some control characters).
For every character that has an ASCII value, the Unicode code point and the ASCII value of that character are the same.
In most modern applications you should prefer to use Unicode strings rather than ASCII. This will for example allow you to have users with accented characters in their name or address, and to localize your interface to languages other than English.
Upvotes: 54