Kai Huppmann
Kai Huppmann

Reputation: 10775

Why does stdout decoding fail when adding carriage return?

The following java code does exactly what is expected:

1      String s = "♪♬♪♪♬♪♪♬♪♪♬♪♪♬♪♪♬♪";
2      for(int i=0; i < s.length(); i++)
3      {
4         System.out.print(s.substring(i,i+1));
5         //System.out.print("\r");
6         Thread.currentThread().sleep(500);
7      }

But when I try to add carriage return by commenting in line 5 it goes printing ?s. Why is it and how will I fix it?

(I also tried with "\u240d" for carriage return - same thing).

EDIT: The output goes to a bash on Mac OS X.

Upvotes: 3

Views: 428

Answers (3)

Andrzej Doyle
Andrzej Doyle

Reputation: 103797

I expect it is due to how your terminal is interpreting the output.

As has been pointed out above, all of the note glyphs are multibyte characters. Additionally, Java chars are just 16 bits wide, so a single char cannot reliably represent a single Unicode character on its own - and subsequently the String.substring method isn't wholly multibyte friendly.

Thus what is likely happening is that on each iteration through the loop, Java prints out half a character, as it were. When the first byte of a pair is printed out, the terminal realises it's the first half of a multibyte character and doesn't display it. When the next byte is printed, the terminal sees the full character corresponding to the note and displays it.

What happens when you uncomment the println("\r"), is that you're inserting a newline in the middle of the two halves of each character. Thus the terminal never gets the byte sequence e.g. 0x26, 0x6C representing the note but instead gets 0x26, 0x10, 0x6C, 0x10 so the note is not rendered.

Upvotes: 3

Jason Orendorff
Jason Orendorff

Reputation: 45106

Java doesn't know that your source file is UTF-8.

If you compile with

javac -encoding utf8 MyClass.java

and run with

java -Dfile.encoding=utf8 MyClass

it will work.

(Does anyone know why UTF-8 isn't the default?)

Upvotes: 1

sascha
sascha

Reputation: 41

please also print s.length(), i bet it is more than 18. the java string representation is utf-16, String.substring just extracts the char values. the musical notes start at 0x1d000 - they don´t fit in a single char. to extract complete codepoints/glyphs from a string use somthing like icu project - UCharacterIterator

ps: i don´t know if your terminal session can display those chars at all

Upvotes: 4

Related Questions