hba
hba

Reputation: 7790

Java - OS X - Unicode mangled string

I'm processing a Unicode text file using the Java platform on OS X. When I open the file using TextEdit or TextWrangler instead of seeing "Nattvardsgästerna" I see "Nattvardsg‰sterna" (which is incorrect). When I open the file using the Java io stream, I see the same incorrect String "Nattvardsg‰sterna".

When I open the file on my PC I see the correct String. I'm not sure where to start solving this problem... Is it an issue with my OS X set-up? Should I open the Java stream with a special flag?

Thanks.

P.S. I'm opening the file like so: fileReader = new BufferedReader(new FileReader(file));

P.S.S. Also, I should mention that I'd like to output the result as an SQL text file so it is important for the OS to distinguish ä correctly.

Upvotes: 0

Views: 186

Answers (1)

Greg Kopff
Greg Kopff

Reputation: 16575

An InputStream reads bytes (not characters), so I assume when you say:

When I open the file using java io stream

... that you really mean "when I open the file using a Java Reader".

EDIT: Your comment says that you're doing this:

new BufferedReader(new FileReader(file));

An InputStreamReader has a constructor that allows you to set the character encoding. If you don't specify one, it will use the platform default. It's unlikely the platform default will be unicode (on my Macbook, it's set to "US-ASCII").

In order to set the character encoding, you must create the intermediate input stream reader rather than that letting FileReader do it for you (because FileReader uses the platform default encoding).

Assuming the file is encoding using UTF-8, use:

new BufferedReader(new InputStreamReader(new FileInputStream(file), 
                                         Charset.forName("UTF-8")));

Alternatively, you can change the platform default by supplying an argument to the JVM. You can look at this answer for the full details, but the basic idea is that you set the file.encoding Java system property. The linked answer provides a few ways to achieve this.

FURTHER EDIT:

P.S.S. Also, I should mention that I'd like to output the result as an SQL text file so it is important for the OS to distinguish ä correctly.

The OS hasn't got anything to do with this. The file system is just shuffling bytes around. How those bytes are interpreted is entirely up to the applications that are reading those files. This answer tells you how to make your Java program interpret the bytes correctly. For your database to be able to interpret the bytes correctly, you'll need to configure the database encoding.

Upvotes: 3

Related Questions