Yogesh Prajapati
Yogesh Prajapati

Reputation: 4870

Issue while saving Non-English character

We are working with one application where we need to save data in language Gujarati.

Technologies used in Applcation is listed below.

My JSP is configured with

<%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>

And

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

Hibernate configuration is

<prop key="hibernate.connection.useUnicode">true</prop>
<prop key="hibernate.connection.characterEncoding">UTF-8</prop>
<prop key="hibernate.connection.charSet">UTF-8</prop>

MySQL URL is

jdbc:mysql://host:port/dbName?useUnicode=true&connectionCollation=utf8_general_ci&characterSetResults=utf8

Pojo having String field to store that data.

MySQL have VARCHAR datatype to store data with charset=utf8 and Collation=utf8_general_ci

When i tried to save any non-english(Gujrati) character it show some garbage character like àª?à«?àª? for "ગુજ".

Is there any other configuration which i missed here.

Upvotes: 5

Views: 3644

Answers (5)

MS Ibrahim
MS Ibrahim

Reputation: 1789

Your applicationContext file should be like this:

To make Spring MVC application supports the internationalization, register two beans :

  1. SessionLocaleResolver Register a “SessionLocaleResolver” bean, named it exactly the same characters “localeResolver“. It resolves the locales by getting the predefined attribute from user’s session. Note If you do not register any “localeResolver”, the default AcceptHeaderLocaleResolver will be used, which resolves the locale by checking the accept-language header in the HTTP request.

  2. LocaleChangeInterceptor Register a “LocaleChangeInterceptor” interceptor and reference it to any handler mapping that need to supports the multiple languages. The “paramName” is the parameter value that’s used to set the locale.

    <bean id="localeResolver"
        class="org.springframework.web.servlet.i18n.SessionLocaleResolver">
        <property name="defaultLocale" value="en" />
    </bean>
    
    <bean id="localeChangeInterceptor"
        class="org.springframework.web.servlet.i18n.LocaleChangeInterceptor">
        <property name="paramName" value="language" />
    </bean>
    
    <bean class="org.springframework.web.servlet.mvc.support.ControllerClassNameHandlerMapping" >
        <property name="interceptors">
           <list>
            <ref bean="localeChangeInterceptor" />
           </list>
        </property>
    </bean>
    
    <!-- Register the bean -->
    <bean class="com.common.controller.WelcomeController" />
    
    <!-- Register the welcome.properties -->
    <bean id="messageSource"
        class="org.springframework.context.support.ResourceBundleMessageSource">
        <property name="basename" value="welcome" />
    </bean>
    
    <bean id="viewResolver"
        class="org.springframework.web.servlet.view.InternalResourceViewResolver" >
        <property name="prefix">
            <value>/WEB-INF/pages/</value>
        </property>
        <property name="suffix">
            <value>.jsp</value>
        </property>
    </bean>
    

The native2ascii is a handy tool build-in in the JDK, which is used to convert a file with ‘non-Latin 1′ or ‘non-Unicode’ characters to ‘Unicode-encoded’ characters.

Native2ascii example

  1. Create a file (source.txt)

Create a file named “source.txt”, put some Chinese characters inside, and save it as “UTF-8″ format.

  1. native2ascii

Use native2ascii command to convert it into Unicode format.

C:>native2ascii -encoding utf8 c:\source.txt c:\output.txt

The native2ascii will read all the characters from “c:\source.txt” and encode it with “utf8″ format, and output all encoded characters to “c:\output.txt”

  1. Read Output

Open the “c:\output.txt”, you will see the all encoded characters, e.g \ufeff\u6768\u6728\u91d1

welcome.properties

welcome.springmvc = \u5feb\u4e50\u5b66\u4e60

Call the above string and store the value in database.

And if you want to display that inside JSP page:

Remember add the line

“<%@ page contentType=”text/html;charset=UTF-8″ %>”

on top of the jsp page, else the page may not able to display the UTF-8 (Chinese) characters properly.

Upvotes: 2

Venkatvasan
Venkatvasan

Reputation: 491

I was facing the same problem while inserting "tamil" characters into the database.After surfing a lot I got a better and working solution and it solves my problem.Here I am sharing my solution with you.I hope it will help you to clear your doubts regarding that Non English character.

INSERT INTO 
STUDENT(name,address) 
VALUES 
(N'பெயர்', N'முகவரி');

I am using a sample since you have not provided me any structure of your table and field name.

Upvotes: 7

Master Slave
Master Slave

Reputation: 28519

Another tip, don't lean only on setting the characterEncoding as a hibernate property <prop key="hibernate.connection.characterEncoding">UTF-8</prop>, make sure you add it explicitely as connection variable on the DB url, so

jdbc:mysql://host:port/dbName?useUnicode=true&characterEncoding=UTF-8&connectionCollation=utf8_general_ci&characterSetResults=utf8

Also, as there is some many layers where an encoding would be lost, you can try to isolate the layer and update to a question. E.g. if its upon storing to DB, or at some point before

Upvotes: 3

Paulius Matulionis
Paulius Matulionis

Reputation: 23415

There might be a couple of things that you could have missed out. I had the same problem with mysql on linux, what I had to do is to edit my.cnf like this:

[client]
default-character-set = utf8

[mysqld]
character-set-server = utf8

For e.g. on Centos this file is location at /etc/my.cnf on Windows (my pc) C:\ProgramData\MySQL\MySQL Server 5.5\my.ini. Please note that ProgramData might be hidden.

Also the other thing if you are using Tomcat is that you have to sepcify UTF-8 for URI encoding. Just edit server.xml and modify your main Connector element:

<Connector port="8080" protocol="HTTP/1.1"
           connectionTimeout="20000"
           URIEncoding="UTF-8"
           redirectPort="8443" />

Also make sure you added character encoding filter in your application:

@WebFilter(filterName = "CharacterEncodingFilter", urlPatterns = {"/*"})
public class CharacterEncodingFilter implements Filter {

    @Override
    public void init(FilterConfig filterConfig)
            throws ServletException {
    }

    @Override
    public void doFilter(ServletRequest servletRequest, ServletResponse servletResponse, FilterChain filterChain)
            throws IOException, ServletException {
        HttpServletRequest request = (HttpServletRequest) servletRequest;

        request.setCharacterEncoding("UTF-8");
        servletResponse.setContentType("text/html; charset=UTF-8");

        filterChain.doFilter(request, servletResponse);
    }

    @Override
    public void destroy() {
    }

}

Hope this helps.

Upvotes: 4

Rick James
Rick James

Reputation: 142278

I am assuming you want ગુજ (GA JA with Vowel sign U)?

I think you somehow specified "latin5". (Yes I see you have UTF-8 everywhere, but "latin5" is the only way I can make things work.)

CONVERT(CONVERT(UNHEX('C3A0C2AAC297C3A0C2ABC281C3A0C2AAC29C')
       USING utf8) USING latin5) = 'ગુજ'

Plus you ended up with "double encoding"; I suspect this is what happened:

  • The client had characters encoded as utf8 (good); and
  • SET NAMES latin5 was used, but it lied by claiming that the client had latin5 encoding; and
  • The column in the table declared CHARACTER SET utf8 (good).

If possible, it would be better to start over -- empty the tables, be sure to have SET NAMES utf8 or establish utf8 when connecting from your client to the database. Then repopulate the tables.

If you would rather try to recover the existing data, this might work:

UPDATE ... SET col = CONVERT(BINARY(CONVERT(
                         CONVERT(UNHEX(col) USING utf8)
                         USING latin5)) USING utf8);

But you would need to do that for each messed up column in each table.

A partial test of that code is to do

SELECT CONVERT(BINARY(CONVERT(
                         CONVERT(UNHEX(col) USING utf8)
                         USING latin5)) USING utf8)
     FROM table;

I say "partial test" because looking right may not prove that is right.

After the UPDATE, SELECT HEX(col) get E0AA97E0AB81E0AA9C for ગુજ. Note that most Gujarati hex should be of the form E0AAyy or E0AByy. You might also find 20 for a blank space.

I apologize for not being more certain. I have been tackling Character Set issues for a decade, but this is a new variant.

Upvotes: 5

Related Questions