Reputation: 59
In these shell commands, it seems that "\u" can only decode four hex digits:
root@bemoan[15:36:29]:~# echo -e '\U30'
0
root@bemoan[15:47:01]:~# echo -e '\u30'
0
root@bemoan[15:47:06]:~# echo -e '\u23f0'
⏰
root@bemoan[15:48:40]:~# echo -e '\u1f340'
ἴ0
root@bemoan[15:49:06]:~# echo -e '\U1f340'
🍀
U+1f340 "\u1f340"
doesn't work, but "\U1f340"
works. Why is that? (I am using the bash shell.)
Upvotes: 1
Views: 2624
Reputation: 11415
You are using the command echo -e
. Take a look at the entry for echo
in section 4.2 Bash Builtin Commands of the Bash Reference Manual. It says that "If the -e option is given, interpretation of the following backslash-escaped characters is enabled." Among those characters are:
\uHHHH
the Unicode (ISO/IEC 10646) character whose value is the hexadecimal value HHHH (one to four hex digits)
\UHHHHHHHH
the Unicode (ISO/IEC 10646) character whose value is the hexadecimal value HHHHHHHH (one to eight hex digits)
This is exactly what you are seeing.
When you execute echo -e '\u1f340'
, the \u
(lowercase 'u') tells echo to read the following four, not five, hexadecimal characters to get the Unicode character value. echo reads them and prints U+1F34 GREEK SMALL LETTER IOTA WITH PSILI AND OXIA. Then echo reads the remaining character in the string, 0
, and prints it. This gives what you see: ἴ0
.
When you execute echo -e '\U1f340'
, the \U
(uppercase 'U') tells echo to read the following five hexadecimal characters to get the Unicode character value. It would accept one to 8 hexadecimal characters. echo reads them and prints U+1F340 FOUR LEAF CLOVER. This gives what you see: 🍀
.
Upvotes: 2