Reputation: 107
I'm writing a routine that saves large numbers to a file, but instead of writing the actual number as a string (eg. 999999), I'd like to use its equivalent UNICODE character (eg. 𘚟), regardless of whether it actually corresponds to a visible or recognizable character. Excluding surrogate pairs, does anyone know which numerical values correspond to a SINGLE Unicode character? I'm asking this since I noticed that certain numerical values correspond to a two-character Unicode code point. Ex. 999999 corresponds to 𘚟, whereas 999998 corresponds to 𘚟.
Upvotes: 0
Views: 4249
Reputation: 39158
Unicode is currently defined to end at 10_ffff₁₆ = 1_114_111₁₀. Some languages are able to relax that restriction, e.g.
#!/usr/bin/env perl
"\x{7fff_ffff_ffff_ffff}";
# ÿ¿¿¿¿¿¿¿¿¿¿
encode "UTF8", "\x{7fff_ffff_ffff_ffff}";
# 0xff 0x80 0x87 0xbf 0xbf 0xbf 0xbf 0xbf 0xbf 0xbf 0xbf 0xbf 0xbf
Upvotes: 2