Reputation: 139
Thanks to jmcnamara I found a great way to use Unicode characters in xlsxwriter charts: xlsxwrter: rich text format in chart title
I need a list of all Unicode characters to copy from. I found some:
Why is there no alphabet for capital subscript letters? Where can I get those?
Upvotes: 4
Views: 3467
Reputation: 41932
Unicode is a character set for mapping between characters/glyphs to numbers. It only deals with plain text and is not supposed for formatting text§. You can't make a letter bold, italic or move a letter to above or below the baseline purely with the Unicode code points (see Create Unicode subscripts and superscripts with combining glyphs)
Characters that seem to represent formatting exist mainly because they were used before in older standards. You can find the reason right in the Unicode standard
Q: Why doesn't Unicode have a full set of superscripts and subscripts?
A: The superscripted and subscripted characters encoded in Unicode are either compatibility characters encoded for roundtrip conversion of data from legacy standards, or are actually modifier letters used with particular meanings in technical transcriptional systems such as IPA and UPA. Those characters are not intended for general superscripting or subscripting of arbitrary text strings—for such textual effects, you should use text styles or markup in rich text, instead.
Compatibility is also why the superscript digits ²³¹ are very frequently different from the remaining characters ⁰⁴⁵⁶⁷⁸⁹ because many fonts just contain the former set but not the latter. And ¹ lies behind ²³ because ISO 8859-1 did it that way
In fact almost anything that may seem silly in Unicode is because of compatibility with older character sets. You can find lots of examples where there's an unnecessary Unicode codepoint representing a series of characters like these Nj, Dž, Ⅷ, ㎉, ㎓, ﷽. Similarly there are many unreasonable emojis like the “copyright” ©️, “registered trademark” ®️ and “trademark” ™️ symbols. People have used them in some other charsets before so Unicode had to do the same in order to be able to be converted successfully to/from them.
§ More information about rich text in Unicode:
Rich Text. Also known as styled text. The result of adding information to plain text. Examples of information that can be added include font data, color, formatting information, phonetic annotations, interlinear text, and so on. The Unicode Standard does not address the representation of rich text. It is expected that systems and applications will implement proprietary forms of rich text. Some public forms of rich text are available (for example, ODA, HTML, and SGML). When everything except primary content is removed from rich text, only plain text should remain.
https://unicode.org/glossary/#rich_text (emphasis mine)
Q: What is the difference between “rich text” and “plain text”?
A: Rich text is text with all its formatting information: typeface, point size, weight, kerning, and so on. Plain text is the underlying content stream to which formatting is applied.
One key distinction between the two is that rich text breaks the text up into runs and applies uniform formatting to each run. As such, rich text is inherently stateful. Plain text is not stateful. It should be possible to lose the first half of a block of plain text without any impact on rendering.
Unicode, by design, only deals with plain text. It doesn't provide a generalized solution to rich text issues.
Upvotes: 5
Reputation: 9533
Unicode is not about formatting, and Unicode considers subscript (and superscript) as formatting, so it is not really supported. Unfortunately HTML has a different opinion (e.g. in select/option tags).
Unicode subscripts are mostly there because of compatibility with older character sets, and Unicode want to be compatible to most of the character sets created before 1991.
So you have the reason why there are many sub-/super-scripts missing. The good: with Unicode recommendation you should be able to have all characters (and in different formatting) as sub-/super-scripts (and double, triple subscripting).
You should use other formatting techniques, if you want better support (and all characters). Sorry not to have good news to you.
Upvotes: 2