rajani chowdhary
rajani chowdhary

Reputation: 175

reading unicode characters from properties file java

Please help me to read the UNICODE characters as it is from the properties file in java. For example : if I pass the key "Account.label.register" it should return to me as "\u5BC4\u5B58\u5668" but not its character representation like "寄存器" . Here is my sample properties file

file_ch.properties

Account.label.register = \u5BC4\u5B58\u5668 
Account.label.login = \u767B\u5F55 
Account.label.username = \u7528\u6237\u540D 
Account.label.password = \u5BC6\u7801 

Thank you.

Hi , I am reading properties file using the following java code

@Override
public ResourceBundle getTexts(String bundleName) {
    ResourceBundle myResources = null;
    try {
        myResources = ResourceBundle.getBundle(bundleName, getLocale());
    } catch (Exception e) {
        myResources = ResourceBundle.getBundle(getDefaultBundleKey(), getLocale());
    }
    return myResources;
}

Using the above approach it's ok fine, I am getting chinese characters. But for some of the ajax requests in my application I need to pass the chinese text in X-JSON header. Sample code is given below

    HashMap<String, List<String>> map = new HashMap<String, List<String>>();
    List<String> errors = new ArrayList<String>();
    errors.add(str);   /*ex: str = "无效的代码" , value taken from properties file through resource bundle*/
    map.put("ERROR", errors);
    JSONObject json = JSONObject.fromObject(map);
    response.setCharacterEncoding("UTF-8");
    response.setHeader("X-JSON", json.toString());
    response.setStatus(500);

I am passing english for example str="Invalid Code" X-JSON header is carrying the information as it is. But if the str="无效的代码" (chinese or any other text) X-JSON header is carrying the text as empty like below is the response I am getting

 response :

 connection:close
 Content-Encoding:gzip
 Content-Type:text/html;charset=UTF-8
 Date:Wed, 08 Jun 2016 10:17:43 GMT
 Server:Apache-Coyote/1.1
 Transfer-Encoding:chunked
 Vary:Accept-Encoding
 X-JSON:{"ERROR":["Invalid Code"]}

However if the "error" contains "chinese" text for ex:"无效的代码"

response :

 connection:close
 Content-Encoding:gzip
 Content-Type:text/html;charset=UTF-8
 Date:Wed, 08 Jun 2016 10:17:43 GMT
 Server:Apache-Coyote/1.1
 Transfer-Encoding:chunked
 Vary:Accept-Encoding
 **X-JSON:{"ERROR":["  "]}**   /*expecting the response X-JSON:{"ERROR":["无效的代码"]}*/

As the chinese text is coming as empty , I thought of sending unicode through X-JSON header like below

{"ERROR":["\u65E0\u6548\u7684\u4EE3\u7801"]}  

After that want to parse the Unicode characters using Javascript code after evaluating X-JSON header like below

var json;
  try {
    json = xhr.getResponseHeader('X-Json');
  } catch (e) {
    alert(e);
 }

  if (json) {
    var data = eval('(' + json + ')'); 
    decodeMsg(data);
  }


  function decodeMsg(message) {
    var mssg =  message;
    var r = /\\u([\d\w]{4})/gi;
    mssg = mssg.replace(r, function (match, grp) {
        return String.fromCharCode(parseInt(grp, 16)); } );
    mssg = unescape(mssg);

    return mssg;
 }

Please give suggestions. Thank you.

Upvotes: 4

Views: 10144

Answers (2)

Joop Eggen
Joop Eggen

Reputation: 109613

Update of answer:

The original encoding of .properties was in Latin-1, ISO-8859-1 (éö). This needed u-escaping for the full Unicode range of characters.

However the newer java versions try UTF-8 first. So you can keep the .properties file in UTF-8! Which is a tremendous improvement.


Original answer: .properties in ISO-8859-1 as of java 1.

The error is that in HTTP the header lines are in ISO-8859-1, basic Latin-1. The solution there is to use %XX conversion of UTF-8 bytes (in this case). However you are better served in case of JSON simply doing as you intended.

So you want to send u-escaped Unicode, using \uXXXX. As not only Java, but also JavaScript/JSON knows this convention, you only need this u-escaping in java on the server.

static String uescape(String s) {
    StringBuilder sb = new StringBuilder(s.length() * 6);
    for (int i = 0; i < chars.length; ++i) {
        char ch = s.charAt(i);
        if (ch < 128) {
            sb.append(ch);
        } else {
            sb.append(String.format("\\u%04X", (int) ch));
        }
    }
    return sb.toString();
}

errors.add(uescape(str));

This zero-pads every non-ASCII (>=128) char as 4 digit hex, the exact format.

Or use apache-commons StringEscapeUtils.escapeJava which also does quotes and \n and such - much safer.

Upvotes: 3

Jesper
Jesper

Reputation: 206996

Escape the backslashes in your properties file by doubling them:

Account.label.register = \\u5BC4\\u5B58\\u5668 
Account.label.login = \\u767B\\u5F55 
Account.label.username = \\u7528\\u6237\\u540D 
Account.label.password = \\u5BC6\\u7801 

Upvotes: 1

Related Questions