Nicolas Raoul
Nicolas Raoul

Reputation: 60193

What characters are NOT present in Unicode?

I have heard that some characters are not present in the Unicode standard despite being written in everyday life by populations of some areas. Especially I have heard about recent Chinese first names fabricated by assembling existing characters parts, but I can't find any reference for this.

For instance, the character below is very common for 50 million people, yet it was not in Unicode until October 2009:

enter image description here

Is there a list of such characters? (images, or website listing such characters as images)

Upvotes: 17

Views: 9369

Answers (4)

Tom Patterson
Tom Patterson

Reputation: 11

It does not support the bilabial trill letter, turned beta, reversed k.

Upvotes: 1

asmeurer
asmeurer

Reputation: 91460

There are tons of characters from the symbol part of the standard that are annoyingly not included.

See the "Missing symmetric versions" section of https://web.archive.org/web/20210830121541/http://xahlee.info/comp/unicode_arrows.html for a bunch of arrow symbols that exist, but only in certain directions. Some are just silly. For example, there is ⥂, ⥃, and ⥄, but there isn't a right pointing version of the last one.

And you can see from http://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts that they picked apparently randomly which letters to support in super- and sub-script form. For example, they include the subscript vowels a, e, o, and even schwa (ə), but not i, which would be very useful, as it's a common subscript in mathematical typesetting. Take a look at the wikipedia article for more details (you'll need a unicode font installed, because at least at the time of this writing they regular ascii equivalents are not explicitly listed), but basically they picked about half of the latin alphabet seemingly at random for each of upper- and lower-case super- and sub-script characters.

Also, a lot of symbols that would be convenient for building shapes with unicode do not exist.

Upvotes: 3

sleske
sleske

Reputation: 83577

Well, there's loads of stuff not present in Unicode (though new characters are still being added).

Some examples:

  • Due to Han Unification, Unicode uses one codepoint for several similar characters from different languages. People disagree whether these characters are really "the same"; if you believe they should be represented separately, then these separate representations could be said to be "missing" (though this is something of a philosophical question).
  • In a similar vein, many languages (especially Asian languages) sometimes have several variants of one character/glyph. The distinction between "one character with several representations" (=one codepoint) and "distinct characters" (=different codepoints) is somewhat arbitratry, thus there are cases (e.g. with Kanji characters) where some people feel alternative variants are "missing".
  • Many historic and rarely used characters are missing.
  • Many old/historic scripts are not covered, e.g. Demotic. Actually, there is an initiative specifically for including more scripts in Unicode, the Script Encoding Initiative(SEI).

There is also a page by the W3C on this topic, Missing characters and glyphs, with more explanations.

Upvotes: 8

jisaacstone
jisaacstone

Reputation: 4264

Also: Here's unicode.org's list of unsupported scripts

Upvotes: 9

Related Questions