whizz waltz
whizz waltz

Reputation: 59

What's the difference between "\U" and "\u" for getting Unicode?

In these shell commands, it seems that "\u" can only decode four hex digits:

root@bemoan[15:36:29]:~# echo -e '\U30'
0
root@bemoan[15:47:01]:~# echo -e '\u30'
0
root@bemoan[15:47:06]:~# echo -e '\u23f0'
⏰
root@bemoan[15:48:40]:~# echo -e '\u1f340'
ἴ0  
root@bemoan[15:49:06]:~# echo -e '\U1f340'
🍀

U+1f340 "\u1f340" doesn't work, but "\U1f340" works. Why is that? (I am using the bash shell.)

Upvotes: 1

Views: 2624

Answers (1)

Jim DeLaHunt
Jim DeLaHunt

Reputation: 11415

You are using the command echo -e. Take a look at the entry for echo in section 4.2 Bash Builtin Commands of the Bash Reference Manual. It says that "If the -e option is given, interpretation of the following backslash-escaped characters is enabled." Among those characters are:

\uHHHH

the Unicode (ISO/IEC 10646) character whose value is the hexadecimal value HHHH (one to four hex digits)

\UHHHHHHHH

the Unicode (ISO/IEC 10646) character whose value is the hexadecimal value HHHHHHHH (one to eight hex digits)

This is exactly what you are seeing.

When you execute echo -e '\u1f340', the \u (lowercase 'u') tells echo to read the following four, not five, hexadecimal characters to get the Unicode character value. echo reads them and prints U+1F34 GREEK SMALL LETTER IOTA WITH PSILI AND OXIA. Then echo reads the remaining character in the string, 0, and prints it. This gives what you see: ἴ0 .

When you execute echo -e '\U1f340', the \U (uppercase 'U') tells echo to read the following five hexadecimal characters to get the Unicode character value. It would accept one to 8 hexadecimal characters. echo reads them and prints U+1F340 FOUR LEAF CLOVER. This gives what you see: 🍀 .

Upvotes: 2

Related Questions