Sathish Kumar S
Sathish Kumar S

Reputation: 43

Text encoding converts junk character in Play! 1.2.4 framework

Issue: Character encoding in Play! 1.2.4 framework becomes.

Context: We are trying to store the text "《我叫MT繁體版》台港澳專屬伺服器上線!" from input text field to mysql using Play! 1.2.4 framework.

Steps that we followed:

1) UI to get the input from user. just any lang text, so we tried Japneese Char. Note: page is set to UTF-8 character encoding.

2) Post submission to Play! controller, the controller just reads the input and stores it using Play! model. snippet mentiond below,

public static void text_create() throws UnsupportedEncodingException,
        ParseException {
    System.out.println("params :: text string value :: "    + params.get("text"));

    String oldString = params.get("text");

    // Converting the input string(which is UTF-8 format) and parsing to Windown-1252
    String newString = new String(oldString.getBytes(), "WINDOWS-1252");        

    // 1. passing encoded text to mysql. 
    // 2. TextCheck table and the column 'text' has encoding and collation format as UTF-8.
    // 3. TextCheck > text column mentioned as String in model.
    TextCheck a = new TextCheck(newString);

    List<Object> text = TextCheck.TextList();
    render(a,text);
}

It stores as TEXT value as "《我�MT�體版》�港澳專屬伺�器上線�"

Problem is there are character � in between value. when i read this raw data from mysql using other platforms like java, ruby or some other language it converts but makes those � characters as junk. just junk.

Note: Interstingly when i read it from same Play! framework. it looks all fine even that junk characters were read correctly.

Question: Why those junk characters ?

Upvotes: 1

Views: 650

Answers (1)

Duncan Jones
Duncan Jones

Reputation: 69339

The problem is the following line:

String newString = new String(oldString.getBytes(), "WINDOWS-1252");

This looks like nonsense to me. Java stores all strings internally using UTF-16, so you can't adjust the encoding of a Java string in the manner you've attempted here.

The getBytes() method returns the bytes of the string using the default platform encoding. You then covert these bytes into a new string using a (probably) different charset. The result is almost certain to be broken.

Upvotes: 1

Related Questions