iago-lito
iago-lito

Reputation: 3208

Why are there holes in the Unicode table?

Given this area of the Unicode table, for instance:

  ...
π‘Ž    U+1D44E Dec:119886       MATHEMATICAL ITALIC SMALL A 𝑎
𝑏    U+1D44F Dec:119887       MATHEMATICAL ITALIC SMALL B 𝑏
𝑐    U+1D450 Dec:119888       MATHEMATICAL ITALIC SMALL C 𝑐
𝑑    U+1D451 Dec:119889       MATHEMATICAL ITALIC SMALL D 𝑑
𝑒    U+1D452 Dec:119890       MATHEMATICAL ITALIC SMALL E 𝑒
𝑓    U+1D453 Dec:119891       MATHEMATICAL ITALIC SMALL F 𝑓
𝑔    U+1D454 Dec:119892       MATHEMATICAL ITALIC SMALL G 𝑔
𝑖    U+1D456 Dec:119894       MATHEMATICAL ITALIC SMALL I 𝑖 # what?!
𝑗    U+1D457 Dec:119895       MATHEMATICAL ITALIC SMALL J 𝑗
π‘˜    U+1D458 Dec:119896       MATHEMATICAL ITALIC SMALL K 𝑘
𝑙    U+1D459 Dec:119897       MATHEMATICAL ITALIC SMALL L 𝑙
π‘š    U+1D45A Dec:119898       MATHEMATICAL ITALIC SMALL M 𝑚
𝑛    U+1D45B Dec:119899       MATHEMATICAL ITALIC SMALL N 𝑛
π‘œ    U+1D45C Dec:119900       MATHEMATICAL ITALIC SMALL O 𝑜
  ...

I would naturally expect u+1d455 to be MATHEMATICAL ITALIC SMALL H. But it seems not defined on any table I look around.

Why are there holes in Unicode table? (also U+1d49d, u+1d53a, etc.)
Is there any way I can fill them?


[EDIT]: These links do state:

The "holes" in the alphabetic ranges are filled by previously defined characters in the Letter like Symbols block shown below.

and

The Unicode Consortium adds new codepoints to the standard all the time. Visit their website to find out about pending codepoints and whether this one is in the pipe. The following table shows typical representations of how the codepoint would look, if it existed. This may help you when debugging, but is not of real use otherwise.

But I just... don't understand what they mean :\

Upvotes: 16

Views: 1287

Answers (2)

First of all, sorry for necroposting, but I believe that if I ended up here through a Google search where it was the first or second result, many other people might, too, and they will be as confused as I was.

I don't have a final answer, but I wanted to point out that iago-lito's answer is not completely rightβ€”it seems to be a legitimate mistake, whether from the Unicode Consortium, the operating systems I've used to check, or the typeface designers. Well, at least in the case of that specific h: there is the β„Ž that's used for the Plack constant, but there is no glyph that would fit what we would consider the mathematical italic small hβ€”that is, a regular width italic serif lowercase h, actually.

My suspicions are that, at the time, most people used serif typefaces everywhere, as Times New Roman is both the default typeface for LaTeX and for many scientific writing guides, such as APAβ€”not to mention browsers, which usually have Times New Roman as the default serif and default typeface. So it could be that the Planck constant h was always rendered as serif, but now, since we use sans-serif typefaces, it's displayed as sans-serif, and there seems to be no way to get a proper, regular weight serif small letter h. Bear in mind that the Planck constant address doesn't have a specific glyph; font files just "redirect" the address to the glyph of whatever letter h they use, so that's why I think that's a possibility, even if it doesn't make so much sense when you think about it.

It's also important to note that many characters have various identical versions throughout Unicode, and, in fact, there is the entire sans-serif alphabet under between 0x1D5A0 - mathematical sans-serif capital a, and 0x1D5D3 - mathematical sans-serif small z, so it's puzzling why they decided not to add this one letterβ€”though people have speculated that it's because of how 'famous' the other one was, and you do want backwards-compatibility. But that doesn't answer it for me, as that actually wouldn't break compatibility. It would just mean that they used the wrong one, and now there is a right one.

Of course, I'm not entirely sure it is a problem in the Unicode Consortium's standard. It could be a mistake in the typeface; maybe the typefaces should have used a serif h as Planck's constant. But this seems to be wide-spread regardless of font file, and, at the very least, there isn't clarity on what typeface designers should have done.

I have, now, submitted a request for information to the Unicode Consortium as to whether they plan to add the letter. Hopefully, they will add it, as the byte number does exist. At least they were this smart.

Meanwhile you can use the mathematical bold italic small h, 𝒉, which is represented in 8-bit as 0x1D489, or in html as 𝒉. That's all for now, at least.

Upvotes: 1

iago-lito
iago-lito

Reputation: 3208

From the comments (cheers guys), I have learnt that these holes are due to some characters being already assigned in Unicode when the whole alphabet had been added.

For instance: before U+1D4* MATHEMATICAL ITALIC SMALL * identifiers were defined, β„Ž was already known in the table as

β„Ž    U+210E Dec:008462        PLANCK CONSTANT ℎ # here it is

So in order to keep consistency in numbering but NOT to duplicate β„Ž id, a hole has been inserted at U+1D455 position.


Similarly, ℬ is known as U+212C SCRIPT CAPITAL B rather than U+1D49D - - - reserved in the MATHEMATICAL SCRIPT CAPITAL letters family.

Similarly, β„‚ from MATHEMATICAL DOUBLE-STRUCK CAPITAL letters family is not U+1D53A because it was already known as U+2102 DOUBLE-STRUCK CAPITAL C.

This was a difficult choice, having to deal with retro-compatibility, consistency and reliability altogether :)

Upvotes: 18

Related Questions