janetsmith
janetsmith

Reputation: 8722

Do we need to check the encoding scheme when reading properties file?

Today one of my testers came to me and said my program had failed her test.

All she did was actually open up all my properties files and save them as Unicode format.

Questions:

  1. Is there an industry practice to check every properties file encoding type?
  2. How do you deal with this problem?

I've never seen any java project running encoding check on properties file before. But I see her point, because customer might save the properties file in different encoding type.

Upvotes: 0

Views: 624

Answers (4)

gedevan
gedevan

Reputation: 1303

Use native2ascii java utility to have your property files in proper state.

Upvotes: 0

Stef
Stef

Reputation: 6981

Are the properties files considered part of the application, or part of user editable files. In the first case, I don't think it's wrong to make assumptions about how parts of your application are encoded or stored.

If the properties files are targeted at the user, as user-editable files, then the principle applies: you should validate and clean any and all input coming in from outside your application.

The official java.util.Properties documentation states that the encoding is in ISO-8859-1.

When saving properties to a stream or loading them from a stream, the ISO 8859-1 character encoding is used. For characters that cannot be directly represented in this encoding, Unicode escapes are used; however, only a single 'u' character is allowed in an escape sequence. The native2ascii tool can be used to convert property files to and from other character encodings.

This can be found here.

Upvotes: 3

Jon Skeet
Jon Skeet

Reputation: 1500535

As others have said, the encoding for properties files read using streams is fixed at ISO-8859-1. You can't really validate this terribly easily - although checking whether the file starts with the UTF-8 byte order mark wouldn't be a bad idea.

As of Java 6, however you can provide a Reader to Properties.load instead of a Stream. If it's still an option, you might want to start using that and mandate UTF-8, which is going to be a heck of a lot easier for many people to use than ISO-8859-1 and the \uxxxx escaping.

Upvotes: 0

ZZ Coder
ZZ Coder

Reputation: 75456

Even though the spec allows Latin-1 in properties file, the common practice is ASCII.

All other charset needs to be converted to ASCII using native2ascii to be safe.

We ran into the same issues when we started to use native encodings, some are in Latin-1 and others in UTF-8 and they are not compatible. So stay with ASCII.

Upvotes: 0

Related Questions