Reputation: 1513
Here is the test.properties file.
mycharacters=ýþÿƛƸ
myotherchars=\u00FD\u00FE\u00FF\u019B\u01B8
Here is the code being used :
import java.awt.FlowLayout;
import java.io.ByteArrayInputStream;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.nio.charset.Charset;
import java.util.ResourceBundle;
import javax.swing.*;
public class MultiByteTest2
{
public MultiByteTest2()
{
ResourceBundle bundle = ResourceBundle.getBundle("test");
JFrame frame = new JFrame("MultiByte Test");
JPanel panel = new JPanel();
panel.setLayout(new FlowLayout());
JLabel label1 = new JLabel(bundle.getString("mycharacters"));
JLabel label2 = new JLabel(" --- " + bundle.getString("myotherchars"));
panel.add(label1);
panel.add(label2);
String defaultCharacterEncoding = System.getProperty("file.encoding");
System.out.println("defaultCharacterEncoding by property: " + defaultCharacterEncoding);
System.out.println("defaultCharacterEncoding by code: " + getDefaultCharEncoding());
System.out.println("defaultCharacterEncoding by charSet: " + Charset.defaultCharset());
frame.add(panel);
frame.setSize(300, 300);
frame.setLocationRelativeTo(null);
frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
frame.setVisible(true);
}
public static void main(String s[])
{
MultiByteTest2 myObject = new MultiByteTest2();
}
public static String getDefaultCharEncoding(){
byte [] bArray = {'w'};
InputStream is = new ByteArrayInputStream(bArray);
InputStreamReader reader = new InputStreamReader(is);
String defaultCharacterEncoding = reader.getEncoding();
return defaultCharacterEncoding;
}
}
Here is the output :
Command to run the above code and the output which shows UTF-8 being used.
>java -Dfile.encoding=UTF-8 MultiByteTest2
Picked up _JAVA_OPTIONS: -Dfile.encoding=UTF-8
defaultCharacterEncoding by property: UTF-8
defaultCharacterEncoding by code: UTF8
defaultCharacterEncoding by charSet: UTF-8
Three questions :
Why does using the actual characters result in a mess of characters being output?
Why does using the Unicode representation work?
The output shows UTF-8 instead of cp1252 which indicates the file.encoding is being used, but why does it not help when using the actual characters in the properties file?
Upvotes: 1
Views: 633
Reputation: 109547
*.properties use ISO-8859-1, Latin-1. This is a very old design decision. By u-escaping Unicode can be read.
I think the cleanest solution would be to use the Properties class, and maybe XML properties (loadFromXML
). The XML could also be held outside the application, which for internationalisation can be a usefull.
One could also in a maven build convert pre-build *.properties in UTF-8 to u-escaped *.properties. This is a maven copy with filtering.
Instead of *.properties, a PropertyResourceBundle, you could also use a ListResourceBundle, a java class containing an array of texts. The resource path in ResBundle can be slightly different w.r.t. period/slash, but that would free one from the encoding, as you can use the IDE project encoding.
Upvotes: 3