Nayn
Nayn

Reputation: 3614

How ensure if java program uses UTF-8 encoding

I recently discovered that relying on default encoding of JVM causes bugs. I should explicitly use specific encoding ex. UTF-8 while working with String, InputStreams etc. I have a huge codebase to scan for ensuring this. Could somebody suggest me some simpler way to check this than searching the whole codebase.

Thanks Nayn

Upvotes: 7

Views: 5256

Answers (4)

BalusC
BalusC

Reputation: 1109635

Not a direct answer, but to ease the job it's good to know that in a bit decent IDE you can just search for used occurrences of InputStreamReader, OutputStreamWriter, String#getBytes(), String(byte[]), Properties#load(), URLEncoder#encode(), URLDecoder#decode() and consorts wherein you could pass the charset and then update accordingly. You'd also like to search for FileReader and FileWriter and replace them by the first two mentioned classes. True, it's a tedious task, but worth it and I'd prefer it above relying on enrivonmental specifics.

In Eclipse for example, select the project(s) of interest, hit Ctrl+H, switch to tab Java Search, enter for example InputStreamReader, tick the Search For option Constructor, choose Sources as the only Search In option, and execute the search.

Upvotes: 3

BillThor
BillThor

Reputation: 7586

If the file is manipulated by native tools on the servers may want to set the encoding to System.getProperty("file.encoding"). I have run into bugs both ways.

Best practice is to know which character set is used, and set that. Also if the file is used to interface to another application, you should define the character set used. This may be a windows code page or a different UTF format.

Upvotes: 0

leonbloy
leonbloy

Reputation: 76016

relying on default encoding of JVM causes bugs

Indeed, one should always specify the charset when encoding/decoding.

If you are satisfied a default global charset for all you encoding/decoding (not always enough), you can live with Bozho's answer : specify a known fixed default in your JVM arguments or in some static initializer.

But it's good practice to search all implicit charset specifications in your code, and replace them with a explicit charset encoding: some typical methods/classes to look at: FileWriter, FileReader, InputStreamReader, OutputStreamWriter, String#getBytes(), String(byte[]).

Upvotes: 0

Bozho
Bozho

Reputation: 597382

System.getProperty("file.encoding")

returns the VM encoding for i/o operations

You can set it by passing -Dfile.encoding=utf-8

Upvotes: 4

Related Questions