Reputation: 165
I am parsing an XML document containing characters in the private area of the Sabon font. These characters have to be replaced because the font has to be changed to Times New Roman. So far, everything is fine.
Now I need a replacement for a character which looks like SS (double s, like a ligature of two s). I inspected Times and didn't find a corresponding char. Does someone know whether there is such a thing in unicode?
Upvotes: 0
Views: 1091
Reputation: 201768
This is a bit of a mystery, but I think that the glyph that you are seeing is a small capital glyph for “ß” U+00DF LATIN SMALL LETTER SHARP S, often called “German double s”. For the word you mention in a comment, this would make little sense, because Broussonet was a French naturalist, and French does not use “ß” (and German does not use “ß” for foreign names), so the few occurrences of “Broußonet” that Google finds must be odd misspellings.
The copied string contains Private Use code points that Sabon seems to use for small capitals. This is somewhat weird, since normally small capitals are nowadays included as glyph variants selectable using OpenType features rather than Private Use code points, which are non-portable by definition.
This still does not explain what is happening, since the string contains “Broussonet” in that sense, with “ss” represented by two copies of the Private Use code point that is used for small caps “s” in Sabon. Presumably, some conversion between “ss” and “ß” is taking place somewhere. Anyway, the “character” in your second comment is U+E03F, a Private Use code point apparently used for small caps “ß”, CFF glyph id germandbls.sc, in Sabon.
However, if the text is interpreted as really being in uppercase, with letters other than the first one represented using small caps, and if “SS” is then interpreted as or replaced by the uppercase form of “ß”, then it’s “ẞ” U+1E9E LATIN CAPITAL LETTER SHARP S. In normal German orthography, “ß” maps to “SS” (two copies of normal letter “S”) in uppercasing, but nowadays Unicode also has U+1E9E, to meet the need to preserve differences in spelling, as in Strauss vs. Strauß, when names are written in all-uppercase. Modern versions of Times New Roman have a glyph for “ẞ”, old versions don’t (U+1E9E was added in Unicode version 5.1, in April 2008).
Upvotes: 3