Reputation: 31
I don't know what "noncharacter" characters are. They are forbidden unicode characters, though I can copy and paste them, like U+FFFF (). If a character has a fixed position in Unicode, and can be used to display something, then:
Upvotes: 3
Views: 702
Reputation: 299495
The Specials block isn't empty. Several of the elements in that block are assigned. Most famously (and importantly), REPLACEMENT CHARACTER (U+FFFD) is in that block. And while it's not technically a character, or in the Specials block, the very important sequence "FFFE" (little-endian BOM) can appear at the beginning of files, so it's useful that U+FFFE not be an otherwise legal character. (The related U+FEFF is technically a character, but its use as a character is deprecated.) If new "specials" are needed, there are several slots still available for them, while staying within that block.
Unicode prefers to group like-things together into blocks with convenient power-of-two sizes, and so there wind up being some left-over values at the end of various blocks that aren't currently assigned. The total Unicode space is over a million code points. Fewer than 300k have been allocated, so there's a lot of room to keep thing tidy.
The official non-characters (the xFFFE and xFFFF of each plane, plus FDDO-FDEF) leave room for "special uses" of byte sequences that you know will never be a character. The BOM is the most famous of these uses, but implementations can use them for other purposes if desired. All told, there are 66 of them out of a million code points, so it's not big cost to offer some future flexibility.
Upvotes: 2