Al-Khwarizmi
Al-Khwarizmi

Reputation: 461

Reading UTF-8 .properties files in Java 1.5?

I have a project where everything is in UTF-8. I was using the Properties.load(Reader) method to read properties files in this encoding. But now, I need to make the project compatible with Java 1.5, and the mentioned method doesn't exist in Java 1.5. There is only a load method that takes an InputStream as a parameter, which is assumed to be in ISO-8859-1.

Is there any simple way to make my project 1.5-compatible without having to change all the .properties files to ISO-8859-1? I don't really want to have a mix of encodings in my project (encodings are already a time sink one at a time, let alone when you mix them) or change all my project to ISO-8859-1.

With "a simple way" I mean "without creating a custom Properties class from scratch".

Upvotes: 2

Views: 3120

Answers (6)

Bijju
Bijju

Reputation: 1

What I just now experienced is, Make all .java files also UTF-8 encoding type (not only properties file where you store UTF-8 characters). This way there no need to use for InputStreamReader also. Also, make sure to compile to UTF-8 encoding.

This has worked for me without any added parameter of UTF-8.

To test this, write a simple stub program in eclipse and change the format of that java file by going to properties of that file and Resource section, to set the UTF-8 encoding format.

Upvotes: -1

Jagger
Jagger

Reputation: 10524

What I personally do in my projects is I keep my properties in UTF-8 files with an extension .uproperties and I convert them to ISO at the build time to .properties files using native2ascii.exe. This allows me to maintain my properties in UTF-8 and the Ant script does everything else for me.

Upvotes: 1

Joop Eggen
Joop Eggen

Reputation: 109547

Depending on your build engine you can \uXXXX-escape the properties into the build target directory. Maven can filter them via the native2ascii-maven-plugin.

Upvotes: 1

Andrew Thompson
Andrew Thompson

Reputation: 168815

One strategy that might work for this situation is as follows:

  1. Read the bytes of the Reader into a ByteArrayOutputStream.
  2. Once that is completed, call toByteArray() See below.
  3. With the byte[] construct a ByteArrayInputStream
  4. Use the ByteArrayInputStream in Properties.load(InputStream)

As pointed out, the above failed to actually convert the character set from UTF-8 to ISO-8859-1. To fix that, a tweak.

After the BAOS has been filled, instead of calling toByteArray()..

  1. Call toString("ISO-8859-1") to get an ISO-8859-1 encoded String. Then look to..
  2. Call String.getBytes() to get the byte[]

Upvotes: 2

Archimedes Trajano
Archimedes Trajano

Reputation: 41220

What you can do is open a thread that would read data using a BufferedReader then write out the data to a PipedOutputStream which is then linked by a PipedInputStream that load uses.

PipedOutputStream pos = new PipedOutputStream();
PipedInputStream pis = new PipedInputStream(pos);
ReaderRunnable reader = new ReaderRunnable(pos, new File("utfproperty.properties"));
Thread t = new Thread(reader);
t.start();
properties.load(pis);
t.join();

The BufferedReader will read the data one character at a time and if it detects it to be a character data not to be within the US-ASCII (i.e. low 7-bit) range then it writes "\u" + the character code into the PipedOutputStream.

ReaderRunnable would be a class that looks like:

public class ReaderRunnable implements Runnable {
  public ReaderRunnable(OutputStream os, File f) {
    this.os = os;
    this.f = f;
  }
  private final OutputStream os;
  private final File f;
  public void run() {
    // open file
    // read file, escape any non US-ASCII characters
  }
}

Now after writing all that I was thinking that someone should've had this problem before and solved it, and the best place to look for these things is in Apache Commons. Fortunately, they have an implementation there.

https://commons.apache.org/io/apidocs/org/apache/commons/io/input/ReaderInputStream.html

The implementation from Apache is not without flaws though. Your input file even if it is UTF-8 must only contain the characters from the ISO-8859-1 character set. The design I had provided above can handle that situation.

Upvotes: 1

kan
kan

Reputation: 28951

Could you use xml-properties instead? As I understand by the spec .properties files should be in ISO-8859-1, if you want other characters, they should be quoted, using the native2ascii tool.

Upvotes: 3

Related Questions