How to move the cursor in the bash shell when echoing emojis?

Question

I am writing a game engine for Bash using the cursor movement feature described here. However, if I echo emojis or other UTF-8 characters that span more than 1 byte, the cursor position seems to get messed up.

For example, the following code is supposed to echo "1🔈3", move the cursor back 3 positions and then echo "abc" in the same place. The result should only be "abc" (ideally). Instead, I see "1abc"

~ $ echo -e "1🔈3\033[3Dabc"
1abc

A similar problem can be illustrated with the carriage feed:

~ $ echo -e "1🔈3\rabc"
abc3

Is there any good way of resolving this? I am using the Terminal app on macOS. Is there any portable way of doing this?

Note: note, not all UTF-8 chars seem to behave this way. Mostly, I have only been able to reproduce this issue with emojis:

~ $ while true; do read -p "Enter emoji: " x; echo $x | hexdump; echo -e "1${x}3\033[3Dabc"; done
Enter emoji: 🔈
0000000 f0 9f 94 88 0a                                 
0000005
1abc
Enter emoji: ♞
0000000 e2 99 9e 0a                                    
0000004
abc
Enter emoji: ☞
0000000 e2 98 9e 0a                                    
0000004
abc
Enter emoji: 😋
0000000 f0 9f 98 8b 0a                                 
0000005
1abc
Enter emoji: 🃘
0000000 f0 9f 83 98 0a                                 
0000005
abc
Enter emoji: 🀖
0000000 f0 9f 80 96 0a                                 
0000005
abc
Enter emoji: 𝕭
0000000 f0 9d 95 ad 0a                                 
0000005
abc
Enter emoji: 🇺🇸
0000000 f0 9f 87 ba f0 9f 87 b8 0a                     
0000009
1abc
Enter emoji: ✎
0000000 e2 9c 8e 0a                                    
0000004
abc

that other guy · Accepted Answer

The problem happens because a 😋is actually rendered across two columns. On my system, the four emoji and eight digits are equally long:

😋😋😋😋
12345678

It's expected that a single Wide character will require two Narrow characters to overwrite it.

Treating these emoji as wide is recommended by Unicode TR51-16:

Current practice is for emoji to have a square aspect ratio, deriving from their origin in Japanese. For interoperability, it is recommended that this practice be continued with current and future emoji. They will typically have about the same vertical placement and advance width as CJK ideographs.

Given the recommendation, I would be comfortable simply hard coding anything in the "Emoticon" Unicode block as being wide. Your other symbols that work, such as 🀖 and ☞ are not in the Emoticon block (they're in Mahjong and Miscellaneous Symbols respectively).

If you want to determine the width at runtime, you can e.g. ask Python, which helpfully reports their East Asian Width as Full/Wide even though the Unicode tables themselves label it Neutral:

$ python3 -c 'import sys; import unicodedata as u; print(u.east_asian_width(sys.argv[1]))' 😋
W

$ python3 -c 'import sys; import unicodedata as u; print(u.east_asian_width(sys.argv[1]))' ♞
N

🇺🇸 is a bit of a special case since it's composed of two different Regional Indicator Symbols with separate code points, but Python labels each of them as Neutral so if you take that as 1 it'll still add up to 2.

How to move the cursor in the bash shell when echoing emojis?

Answers (2)

Related Questions