Reputation: 1102
What is the likelihood that I'll run into COMBINING LATIN SMALL LETTER C (U+0368) in "real life" (besides clever Scottish folk)?
I'm asking since it's in both the Unicode Block Combining Diacritical Marks and the Category Mark, Nonspacing [Mn].
As a result, it seems to gets treated the same as characters such as COMBINING GRAVE ACCENT (U+0300) by Utilities such as the ICU Transliterator (using either the suggested "NFD; [:Nonspacing Mark:] Remove; NFC"
or a straight "Latin-ASCII"
transliteration).
Upvotes: 2
Views: 631
Reputation: 201568
The likelihood is very close to zero, but not exactly zero. You cannot prevent anyone from using a Unicode character as he likes. There is no specific information about U+0368 in the Unicode Standard, but it has definitely been defined as a combining character that will cause a symbol (c) to be displayed above the preceding character. I would expect to find it mostly in digitized forms of medieval manuscripts, or something like that.
Using it after a space character, as in the “clever” page mentioned, is not the intended use, but not invalid either. Unicode lets you use any combining mark after any character, whether it makes sense or not.
It has no canonical or compatibility decomposition, so there is no clear-cut way to deal with in a context where you cannot, or do not want to, retain the character.
Upvotes: 2
Reputation: 70145
The likelihood is utterly indeterminate except to say that if you expect it not to occur, then it will occur.
Upvotes: 2