Problem with saving chars to file

Question

I faced in a problem with unicode characters serialization and deserialization. Here is a sample program that writes a char to file and then tries to read it. Written and read chars (ch and ch2) are different. Any suggestions why I get this behavior?

public class MainClass {
    public static void main(String[] args) {
        try {
            File outfile = new File("test.txt");
            FileOutputStream fos = new FileOutputStream(outfile);
            OutputStreamWriter writer = new OutputStreamWriter(fos, "UTF-16");
            FileInputStream fis = new FileInputStream(outfile);
            InputStreamReader reader = new InputStreamReader(fis, "UTF-16");

            char ch = 56000;
            System.out.println(Integer.toBinaryString(ch));
            writer.write(ch);
            writer.close();

            char ch2 = (char) reader.read();
            System.out.println(Integer.toBinaryString(ch2));
            reader.close();

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

UPD: Empirically found that this happens only for numbers from range 55296-57343.

BalusC · Accepted Answer

Character 56000 is U+DAC0 which is not a valid unicode character, it's a high surrogate character. They are to be used in a pair to point characters outside the 16 bit wide BMP.

Problem with saving chars to file

Answers (1)

Related Questions