Reputation: 2600
The follwing line
Files.write(Paths.get("test.txt"), Arrays.asList("ü"), StandardCharsets.UTF_8);
should write a ü
in test.txt encoded in utf-8- At least this is what I expect it to do. But if I open the file in a text editor, the editor shows
ü
and the editor states that it would read the file as utf-8. I even tried two editors and both show the same unexpected result. A HEX-Editor shows
c3 83 c2 bc 0d 0a
The last four bytes are line feed and carriage return, that's ok, but the first two bytes should have been c3 bc
... since this should be the hex-encoding of ü
in UTF-8 (according to https://www.utf8-zeichentabelle.de/)
The java-file is encoded in UTF-8, confirmed by two editors.
What am I missing? Why is the ü
not encoded in utf-8 even though I explicitly passed the charset to Files.write()
?
Upvotes: 1
Views: 855
Reputation: 109547
Try instead of "ü" the ASCII u-encoding: "\u00FC". If that suddenly works it means that the editor uses an other encoding (UTF-8) than the javac compiler (Cp1252). By the way: , StandardCharsets.UTF_8 is default.
The java source was saved in the editor as UTF-8, two bytes with high bit set. The java compiler javac compiled with encoding Cp1252 (probably) and turned the two bytes in two chars, which as UTF-8 summed up to 4 bytes.
So the compiler encoding had to be set. In this case also for the test sources.
Upvotes: 2