Reputation: 2775
I'm seeing somewhat unusual behavior around the rendering of 誤 in the browser (works across both Firefox and Chrome), which I'm having trouble explaining.
Specifically, check out the Wiktionary page for 誤:
Notice that there are 3 variations marked in black bold:
The relation between 2 and 3 is clear: 2 represents the traditional character and 1 represents the simplified character. But what does 1 represent? I've tried the following:
So what is going on with this unusual character rendering and copy-pasting behavior? How can I reproduce character 1 (and not 2) in other applications?
FWIW, when I look at a Chinese dictionary, the stroke order shows character 2 even though the browser renders the character as 1.
Upvotes: 2
Views: 225
Reputation: 16198
This is a z-variant, and in this case probably an example of Han unification.
From https://www.zdic.net/hans/%E8%AA%A4:
You can see that the first character (marked as 内地 Mainland China) is what you're getting in the headword.
The headword on Wikipedia is formatted with lang=zh
, whereas the example sentences use zh-Hans
and zh-Hant
respectively, and that's the core of this, along with likely subtags fallback.
Most systems dealing with locales perform locale fallback using likely subtags: So, Hans
without any country specified typically implies CN
, and Hant
implies TW
during fallback. The reverse is also true (and some other countries like HK imply Hant
as well). Hans
/Hant
are script codes for Simplified and Traditional Chinese, and CN/TW are China and Taiwan respectively. zh
on its own implies zh-CN
(and thus zh-Hans-CN
)
Fallback also need not always occur the same way, different fonts have different priorities (e.g. a Mainland Chinese font may assume CN by default unless explicitly told otherwise)
I made a little table, screenshot showing the rendering of different language tags on my system when run on Wikipedia (snippet at the bottom of this post)
The font's actually defaulting to Noto Sans CJK JP unless I put it in a class=Hant
context (where it switches to Noto Sans CJK TC).
What's happening under the hood is: traditional vs simplified is not unified in Unicode, but such variants are. Even though zh
implies zh-Hans-CN
, because this is a traditional character, the font will not use the Hans
to pick a Simplified character: it must pick a traditional character since Simplified is encoded differently. So you get the Mainland Chinese traditional variant in zh
contexts (like the headword), but since zh-Hant
implies zh-TW
, the font is happy to oblige and give you the Taiwanese (still traditional) variant in the example sentence.
Note that not all cases stick to a single font: sometimes the choice of language can force a different font to be selected (or the precise CSS used). Additionally, you can have z-variants crop up in different contexts without needing to change the language, for example the Cantonese possessive 嘅 can be built as ⿰口既 or ⿰口旣 and the choice is not clearly locale based and seems to vary freely between fonts.
Code for table above:
<table>
<tr lang=zh><td>zh</td><td>誤</td></tr>
<tr lang=zh-Hans><td>zh-Hans</td><td>誤</td></tr>
<tr lang=zh-Hant><td>zh-Hant</td><td>誤</td></tr>
<tr lang=zh-CN><td>zh-CN</td><td>誤</td></tr>
<tr lang=zh-Hant-CN><td>zh-Hant-CN</td><td>誤</td></tr>
<tr lang=zh-Hans-CN><td>zh-Hans-CN</td><td>誤</td></tr>
<tr lang=zh-TW><td>zh-TW</td><td>誤</td></tr>
<tr lang=zh-HK><td>zh-HK</td><td>誤</td></tr>
<tr lang=zh-Hans-TW><td>zh-Hans-TW</td><td>誤</td></tr>
<tr lang=ja><td>ja</td><td>誤</td></tr>
<tr lang=ko><td>ko</td><td>誤</td></tr>
<tr lang=vi><td>vi</td><td>誤</td></tr>
</table>
Upvotes: 1
Reputation: 2775
(Based on a Twitter discussion with manishearth)
The difference is coming up due to variations across fonts (called z-variants). Specifically, based on the language tag, the browser can pick different fonts within the same font family (e.g. sans-serif). For example, on my device:
These two fonts render the character differently. The lang tag is different in different parts of the HTML, causing different font selection and hence different rendering.
Outside the browser, depending on the language context, the variant/language can also change. There is more discussion of this with examples on the Han Unification Wikipedia page.
Upvotes: 0