Roger
Roger

Reputation: 6527

reading unicode *.txt files?

Currently I am reading .txt files with

    FileInputStream is = new FileInputStream(masterPath+txt);
    BufferedReader br = new BufferedReader(new InputStreamReader(is));

    String readLine = null;

        while ((readLine = br.readLine()) != null) 
        {
        ...

But unicode characters do not appear as they should.

Any ideas how to change the above code, for unicode to work?

Thanks!

Upvotes: 1

Views: 5707

Answers (3)

JB Nizet
JB Nizet

Reputation: 692081

Yes. Specify the appropriate encoding when constructing your InputStreamReader. If your file is UTF-8 encoded, use

new BufferedReader(new InputStreamReader(is, "UTF-8"));

Upvotes: 6

hmakholm left over Monica
hmakholm left over Monica

Reputation: 23342

The plain InputStreamReader constructor will assume that the file has the system's "default encoding". Because it is rather unpredictable what that is, this constructor should not be used except in toy examples. Use one of the two-argument constructors that allow you to specify the encoding explicitly.

By the way, "Unicode" is not sufficient to tell what is in the file you want to read. Unicode, by and of itself, defines just how numbers ("codepoints") are assigned to characters, not how to pack those numbers into bytes in a file, which is the job of an "encoding". In practice your encoding is likely to be either UTF-8 or UTF-16 or some endianness.

Upvotes: 2

Clement Herreman
Clement Herreman

Reputation: 10536

Maybe your file isn't unicode encoded, or maybe the way you're displaying it isn't unicode-compliant (Windows cmd.exe, I'm looking at you).

Upvotes: 1

Related Questions