Reputation: 3614
I recently discovered that relying on default encoding of JVM causes bugs. I should explicitly use specific encoding ex. UTF-8 while working with String
, InputStreams
etc.
I have a huge codebase to scan for ensuring this. Could somebody suggest me some simpler way to check this than searching the whole codebase.
Thanks Nayn
Upvotes: 7
Views: 5256
Reputation: 1109635
Not a direct answer, but to ease the job it's good to know that in a bit decent IDE you can just search for used occurrences of InputStreamReader
, OutputStreamWriter
, String#getBytes()
, String(byte[])
, Properties#load()
, URLEncoder#encode()
, URLDecoder#decode()
and consorts wherein you could pass the charset and then update accordingly. You'd also like to search for FileReader
and FileWriter
and replace them by the first two mentioned classes. True, it's a tedious task, but worth it and I'd prefer it above relying on enrivonmental specifics.
In Eclipse for example, select the project(s) of interest, hit Ctrl+H, switch to tab Java Search, enter for example InputStreamReader
, tick the Search For option Constructor, choose Sources as the only Search In option, and execute the search.
Upvotes: 3
Reputation: 7586
If the file is manipulated by native tools on the servers may want to set the encoding to System.getProperty("file.encoding"). I have run into bugs both ways.
Best practice is to know which character set is used, and set that. Also if the file is used to interface to another application, you should define the character set used. This may be a windows code page or a different UTF format.
Upvotes: 0
Reputation: 76016
relying on default encoding of JVM causes bugs
Indeed, one should always specify the charset when encoding/decoding.
If you are satisfied a default global charset for all you encoding/decoding (not always enough), you can live with Bozho's answer : specify a known fixed default in your JVM arguments or in some static initializer.
But it's good practice to search all implicit charset specifications in your code, and replace them with a explicit charset encoding: some typical methods/classes to look at: FileWriter
, FileReader
, InputStreamReader
, OutputStreamWriter
, String#getBytes()
, String(byte[])
.
Upvotes: 0
Reputation: 597382
System.getProperty("file.encoding")
returns the VM encoding for i/o operations
You can set it by passing -Dfile.encoding=utf-8
Upvotes: 4