Hammerite
Hammerite

Reputation: 22350

What is the "U\+[0-9A-F]{4,6}" notation for specifying a Unicode character called?

What is the name of this notation? For example, if I want to say of the character U+2603 SNOWMAN,

The _____ of the Snowman character is "U+2603".

what should replace the _____ to make the statement accurate (but would not make it accurate if instead of literally "U+2603" it said something else, like "2603" or "9731")?

The Wikipedia page for Unicode describes the convention of writing U+ and then some hexadecimal digits, without giving it a name.

Upvotes: 0

Views: 189

Answers (2)

Jukka K. Korpela
Jukka K. Korpela

Reputation: 201886

The notation has no official name. The Unicode standard, v. 7, says in clause 2.4:

When referring to code points in the Unicode Standard, the usual practice is to refer to them by their numeric value expressed in hexadecimal, with a “U+” prefix. (See Appendix A, Notational Conventions.)

Appendix A says:

In running text, an individual Unicode code point is expressed as U+n, where n is four to six hexadecimal digits, using the digits 0–9 and uppercase letters A–F (for 10 through 15, respectively). Leading zeros are omitted, unless the code point would have fewer than four hexadecimal digits—for example, U+0001, U+0012, U+0123, U+1234, U+12345, U+102345.

  • U+0416 is the Unicode code point for the character named cyrillic capital letter zhe.

The U+ may be omitted for brevity in tables or when denoting ranges.

So the thing closest to an official name would be “U+n notation”. But it isn’t given as a name; it’s just part of a description, with n being a placeholder.

In the notation, the “U+” part just informs and manifests that the following digits are to be interpreted as a code point in the hexadecimal notation. So you can say “The code point of the Snowman character is 2603 in hexadecimal” or “The code point of the Snowman character is U+2603”.

There is seldom any need to distinguish between the notations 2603 and U+2603. You just use whichever is more suitable on practical grounds, and explain it if necessary. But here’s an example of a case where a distinction needs to made and how it can be made: In Microsoft Office Word, you can enter a Unicode character by entering its code number in hexadecimal and then pressing AltX; however, if the preceding character is a letter A–F, a–f, X, or x or a digit 0–9, you need to precede the code number by the two characters “U+” or “u+”. (Note that any name for the notation would not help much, especially since no name is generally known and understood.)

Upvotes: 2

BoltClock
BoltClock

Reputation: 724522

Strictly speaking, the term that would fill in the blank is code point:

The code point of the Snowman character is "U+2603".

This term is first used in this section of the Wikipedia article for Unicode:

In text processing, Unicode takes the role of providing a unique code point—a number, not a glyph—for each character.

And the connection between it and the "U+" notation is made a little further down:

Normally a Unicode code point is referred to by writing "U+" followed by its hexadecimal number. For code points in the Basic Multilingual Plane (BMP), four digits are used (e.g. U+0058 for the character LATIN CAPITAL LETTER X); for code points outside the BMP, five or six digits are used, as required (e.g. U+E0001 for the character LANGUAGE TAG and U+10FFFD for the character PRIVATE USE CHARACTER-10FFFD).

However, the notation itself doesn't have a name, probably because it doesn't need one. It's just a way of representing a code point in writing. The only documents I can find on the Web that make a reference to the notation simply call it "the U+nnnn notation" or something similar. Even the Unicode spec makes no direct reference to the notation; it simply uses it when referring to a code point.

If "U+2603" were "2603" instead, then I would probably say:

The Unicode hexadecimal value of the Snowman character is 2603.

Likewise for "9731":

The Unicode decimal value of the Snowman character is 9731.

Upvotes: 2

Related Questions