Terrence
Terrence

Reputation: 45

Java read unicode with \u

My java program is reading unicode from text file. e.g. \uffff.. View from the java GUI is no problem, but when i try to print out, all wording are overwritten, is it because of \u, or any other way to avoid the words overwritten?

sorry about my broken english.. thanks.

Upvotes: 3

Views: 48271

Answers (2)

santu
santu

Reputation: 685

As you already know, '\u' also known as Unicode escape is used to represent an international character. So as you can't enter that character from the keyboard itself, you need to use the unicode sequence to generate the character.

However, if such international characters are already there in a text file, so ofcourse you can read it. Java provides the class Charset, please refer the API at http://docs.oracle.com/javase/1.4.2/docs/api/java/nio/charset/Charset.html

You should use Reader/Writer API in Java to deal with such characters. Because it supports 16 bit character which includes all the different languages other than Alphabets and ASCII. Where as InputStream/OutputStream do support only 8 bit character.

So to read such characters you can use:

BufferedReader in = new BufferedReader(
        new InputStreamReader(new FileInputStream(file), "UTF-8"));

Here UTF-8 is the CharSet.

Similarly you can print the data. But where you print, your editor (where you print the character) must support the unicode characters.

You can also refer the below link for some more replies from different people: Read unicode text files with java

Upvotes: 1

Joop Eggen
Joop Eggen

Reputation: 109593

The notation \uXXXX primarily only occures in .java and .properties files. There it is read as a Unicode code point. Unicode text (=using all kind of special characters) often uses the UTF-8 format (though also sometimes UTF16LE and UTF16BE are used).

This text is read as:

BufferedReader in = new BufferedReader(
        new InputStreamReader(new FileInputStream(file), "UTF-8"));

And (for good order) written as

new OutputStreamWriter(new FileOutputStream(file), "UTF-8")
new PrintWriter(file, "UTF-8")

Especially not with FileReader and FileWriter which old utility classes use the platform encoding.

IF the text would countain \u20AC, that would be irregular, and would be printed literally (backslash, u, 20AC),

Now if you mean there are problems with Unicode characters out of the normal ASCII range, like for the euro symbol , then it might be a matter of font, or a needed conversion, say to Windows Latin 1: "Windows-1252".

Upvotes: 4

Related Questions