Reputation: 8722
Today one of my testers came to me and said my program had failed her test.
All she did was actually open up all my properties files and save them as Unicode format.
Questions:
I've never seen any java project running encoding check on properties file before. But I see her point, because customer might save the properties file in different encoding type.
Upvotes: 0
Views: 624
Reputation: 1303
Use native2ascii java utility to have your property files in proper state.
Upvotes: 0
Reputation: 6981
Are the properties files considered part of the application, or part of user editable files. In the first case, I don't think it's wrong to make assumptions about how parts of your application are encoded or stored.
If the properties files are targeted at the user, as user-editable files, then the principle applies: you should validate and clean any and all input coming in from outside your application.
The official java.util.Properties
documentation states that the encoding is in ISO-8859-1
.
When saving properties to a stream or loading them from a stream, the ISO 8859-1 character encoding is used. For characters that cannot be directly represented in this encoding, Unicode escapes are used; however, only a single 'u' character is allowed in an escape sequence. The native2ascii tool can be used to convert property files to and from other character encodings.
This can be found here.
Upvotes: 3
Reputation: 1500535
As others have said, the encoding for properties files read using streams is fixed at ISO-8859-1. You can't really validate this terribly easily - although checking whether the file starts with the UTF-8 byte order mark wouldn't be a bad idea.
As of Java 6, however you can provide a Reader
to Properties.load instead of a Stream
. If it's still an option, you might want to start using that and mandate UTF-8, which is going to be a heck of a lot easier for many people to use than ISO-8859-1 and the \uxxxx
escaping.
Upvotes: 0
Reputation: 75456
Even though the spec allows Latin-1 in properties file, the common practice is ASCII.
All other charset needs to be converted to ASCII using native2ascii
to be safe.
We ran into the same issues when we started to use native encodings, some are in Latin-1 and others in UTF-8 and they are not compatible. So stay with ASCII.
Upvotes: 0