I don't know what "noncharacter" characters are. They are forbidden unicode characters, though I can copy and paste them, like U+FFFF (). If a character has a fixed position in Unicode, and can be used to display something, then: Why are those characters "noncharacter"? What is the point of classifying them as not a character, as they hold a position on a table and can be displayed (though as a replacement character) in HTML and CSS, even? What's the point in having so many empty spaces in Unicode, like in the "Specials" (FFF0-FFFF) block?

Reputation: 31

What is a "noncharacter" in unicode?

I don't know what "noncharacter" characters are. They are forbidden unicode characters, though I can copy and paste them, like U+FFFF (). If a character has a fixed position in Unicode, and can be used to display something, then:

Why are those characters "noncharacter"?
What is the point of classifying them as not a character, as they hold a position on a table and can be displayed (though as a replacement character) in HTML and CSS, even?
What's the point in having so many empty spaces in Unicode, like in the "Specials" (FFF0-FFFF) block?

Upvotes: 3

Answers (1)

Rob Napier

Reputation: 299495

The Specials block isn't empty. Several of the elements in that block are assigned. Most famously (and importantly), REPLACEMENT CHARACTER (U+FFFD) is in that block. And while it's not technically a character, or in the Specials block, the very important sequence "FFFE" (little-endian BOM) can appear at the beginning of files, so it's useful that U+FFFE not be an otherwise legal character. (The related U+FEFF is technically a character, but its use as a character is deprecated.) If new "specials" are needed, there are several slots still available for them, while staying within that block.

Unicode prefers to group like-things together into blocks with convenient power-of-two sizes, and so there wind up being some left-over values at the end of various blocks that aren't currently assigned. The total Unicode space is over a million code points. Fewer than 300k have been allocated, so there's a lot of room to keep thing tidy.

The official non-characters (the xFFFE and xFFFF of each plane, plus FDDO-FDEF) leave room for "special uses" of byte sequences that you know will never be a character. The BOM is the most famous of these uses, but implementations can use them for other purposes if desired. All told, there are 66 of them out of a million code points, so it's not big cost to offer some future flexibility.

Upvotes: 2

What is a &quot;noncharacter&quot; in unicode?

Answers (1)

Related Questions

What is a "noncharacter" in unicode?