Reputation: 55
I am writing a game engine for Bash using the cursor movement feature described here. However, if I echo emojis or other UTF-8 characters that span more than 1 byte, the cursor position seems to get messed up.
For example, the following code is supposed to echo "1π3", move the cursor back 3 positions and then echo "abc" in the same place. The result should only be "abc" (ideally). Instead, I see "1abc"
~ $ echo -e "1π3\033[3Dabc"
1abc
A similar problem can be illustrated with the carriage feed:
~ $ echo -e "1π3\rabc"
abc3
Is there any good way of resolving this? I am using the Terminal app on macOS. Is there any portable way of doing this?
Note: note, not all UTF-8 chars seem to behave this way. Mostly, I have only been able to reproduce this issue with emojis:
~ $ while true; do read -p "Enter emoji: " x; echo $x | hexdump; echo -e "1${x}3\033[3Dabc"; done
Enter emoji: π
0000000 f0 9f 94 88 0a
0000005
1abc
Enter emoji: β
0000000 e2 99 9e 0a
0000004
abc
Enter emoji: β
0000000 e2 98 9e 0a
0000004
abc
Enter emoji: π
0000000 f0 9f 98 8b 0a
0000005
1abc
Enter emoji: π
0000000 f0 9f 83 98 0a
0000005
abc
Enter emoji: π
0000000 f0 9f 80 96 0a
0000005
abc
Enter emoji: π
0000000 f0 9d 95 ad 0a
0000005
abc
Enter emoji: πΊπΈ
0000000 f0 9f 87 ba f0 9f 87 b8 0a
0000009
1abc
Enter emoji: β
0000000 e2 9c 8e 0a
0000004
abc
Upvotes: 4
Views: 509
Reputation: 8446
Try this:
s="1π3" ; printf "$s"; sleep 2; printf "\033[$((${#s}+1))Dabc%${#s}s\n" ' '
I've put a delay in between the printf
s so it's easier to see what happens. First there's:
1π 3
Two seconds later the above is overwritten with:
abc
How it works: We put the unicode stuff in a string $s
. The ${#s}
returns the length in bytes of that string. The length is used in $((${#s}+1))
to calculate how many spaces back to move, then %${#s}s
tells printf
how many spaces it needs (plus a few more) to overwrite any leftover chars.
If "a few more" spaces is too many, counting the overwriting string gives a more precise result:
s="1π3" t="abc"
printf "${s}"; sleep 2; printf "\033[$((${#s}+1))D$t%$((1+${#s}-${#t}))s\n" ''
Upvotes: 2
Reputation: 123650
The problem happens because a πis actually rendered across two columns. On my system, the four emoji and eight digits are equally long:
ππππ
12345678
It's expected that a single Wide character will require two Narrow characters to overwrite it.
Treating these emoji as wide is recommended by Unicode TR51-16:
Current practice is for emoji to have a square aspect ratio, deriving from their origin in Japanese. For interoperability, it is recommended that this practice be continued with current and future emoji. They will typically have about the same vertical placement and advance width as CJK ideographs.
Given the recommendation, I would be comfortable simply hard coding anything in the "Emoticon" Unicode block as being wide. Your other symbols that work, such as π and β are not in the Emoticon block (they're in Mahjong and Miscellaneous Symbols respectively).
If you want to determine the width at runtime, you can e.g. ask Python, which helpfully reports their East Asian Width as Full/Wide even though the Unicode tables themselves label it Neutral:
$ python3 -c 'import sys; import unicodedata as u; print(u.east_asian_width(sys.argv[1]))' π
W
$ python3 -c 'import sys; import unicodedata as u; print(u.east_asian_width(sys.argv[1]))' β
N
πΊπΈ is a bit of a special case since it's composed of two different Regional Indicator Symbols with separate code points, but Python labels each of them as Neutral so if you take that as 1 it'll still add up to 2.
Upvotes: 3