Jishnu
Jishnu

Reputation: 25

saving and getting arabic language in mysql using java

While saving an Arabic word into mysql table, i am getting a string starting with &# and each block have semicolon. am using jsp as front end, in jsp i can display the string in arabic, but while passing the string into form:input the data is shown as some unreadable code.

I wasted lot much time while fixing this issue, maybe this is not an issue but how can i convert this into the actual Arabic word in Java?, any suggestion will be helpful.

Upvotes: 0

Views: 140

Answers (1)

Joop Eggen
Joop Eggen

Reputation: 109547

That are numeric HTML entities.

Probably the data stem from an HTML form. The browser sent the text inputs as numeric entities as the form did not indicate that the server accepts that encoding. Assuming UTF-8:

The HTML best should be in the correct encoding - just for good measure.

<!DOCTYPE html>
<html>
   <meta charset="UTF-8">

The form should tell that the server accepts the encoding, not needing numeric entities.

<form action="/action_page.php" accept-charset="UTF-8">

Repairs:

String s = ...
Pattern pattern = Pattern.compile("(?i)\\&#((x[A-F0-9]+)|\\d+);");
Matcher m = pattern.matcher(s);
StringBuffer sb = new StringBuffer();
while (m.find()) {
    String code = m.group(1);
    if (code.startsWith("x") || code.startsWith("X")) {
        code = "0" + code;
    }
    int codePoint = Integer.decode(code);
    m.appendReplacement(sb, "");
    sb.appendCodePoint(codePoint);
}
m.appendTail(sb);
s = sb.toString();

The numeric entities come in two forms: &#65; in base 10, and &#x3F; in base 16.

Alternatively probably StringEscapeUtils.unescapeHtml4 works satisfactory.

Upvotes: 1

Related Questions