Reputation: 1048
I'm writing a project which parses a UTF-8 encoded file.
I'm doing it this way
ArrayList<String> al = new ArrayList<>();
BufferedReader bufferedReader = new BufferedReader(new
InputStreamReader(new FileInputStream(filename),"UTF8"));
String line = null;
while ((line = bufferedReader.readLine()) != null)
{
al.add(line);
}
return al;
The strange thing is that it reads the file properly when I run it in IntelliJ, but not when I run it through java -jar
(It gives me garbage values instead of UTF8).
What can I do to either
Upvotes: 1
Views: 891
Reputation: 10797
I think that what is going on here is that you just don't have your terminal setup correctly for your default encoding. Basically, if your program runs correctly, then it's grabbing the UTF-8 bytes, storing them as Java strings, then outputting them to the terminal in whatever the default encoding scheme is. To find out what your default encoding scheme see this question. Then you need to ensure that your terminal that you are running your java -jar
command from is compatible with it. For example, see my terminal settings/preferences on my Mac.
Upvotes: 1
Reputation: 14230
Oracle docs give a pretty straightforward answer about Charset:
Standard charsets
Every implementation of the Java platform is required to support the following standard charsets. Consult the release documentation for your implementation to see if any other charsets are supported. The behavior of such optional charsets may differ between implementations.
...
UTF-8
Eight-bit UCS Transformation Format
So you should use new InputStreamReader(new FileInputStream(filename),"UTF-8"));
Upvotes: 0