Rob Hruska
Rob Hruska

Reputation: 120446

Cannot get Servlet to process request content as UTF-8

I'm converting a legacy app from ISO-8859-1 to UTF-8, and I've used a number of resources to determine what I need to set to get this to work. However, after several configuration, code, and environment changes, my Servlet (in Tomcat 5) doesn't seem to process submitted HTML form content as UTF-8.

Here's what I've set up for configuration.

[user@server ~]$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
<Connector protocol="HTTP/1.1"
    ...
    URIEncoding="UTF-8"
    useBodyEncodingForURI="true"/>
<%@ page language="java" pageEncoding="UTF-8" contentType="text/html;charset=UTF-8" %>
...
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
{
    if(request.getCharacterEncoding() == null)
    {
        request.setCharacterEncoding("UTF-8");
    }
    ...

With some debug logs I know the following:

System.getProperty("file.encoding"): "UTF-8"
java.nio.charset.Charset.defaultCharset(): "UTF-8"
new OutputStreamWriter(new ByteArrayOutputStream()).getEncoding(): "UTF8"

However, when I submit my form with an input containing "Бить баклуши", I see the following (from my logs):

request.getParameter("myParameter") = Ð\221иÑ\202Ñ\214 баклÑ\203Ñ\210Ð

I know that the request content type was null, so it was explicitly set to "UTF-8" in my servlet filter. Also, I'm viewing my logs from a terminal, whose encoding I know is set to UTF-8 as well.

What am I missing here? What else do I need to set for the Servlet to correctly process my input as UTF-8? If more information will help, I'll be glad to add more debugging and update this question with it.

Edit:

Solution:

My web.xml definition for my CharsetFilter was too far down (below my servlet configurations and other filters). I moved the filter definition to the very top of the web.xml document and everything worked correctly. See the accepted answer below.

Upvotes: 2

Views: 18169

Answers (2)

Amin
Amin

Reputation: 21

At first I thought the issue would get settled easily but it took me 2 days to figure it out. Here is my finding and I hope it helps 1) You need to have below code in your JSP

<%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>

if you have many JPS pages then you can use below code in web.xml as explained here: How can I cleanly set the pageEncoding of all my JSPs?

2) Be sure before you read any parameter in your servlet, you have already set character encoding to UTF-8

request.setCharacterEncoding("UTF-8");

I have done it in my own filter (first filter before chain.doFilter.

3) Your database must support UTF-8 so be sure you have already applied the changes to your table and columns. To be sure it works fine just type in some words in Japanese and save. If the table holds the content then that is fine.

4) The last and most important one is the connection string to your database. Even though all my DB and tables were supporting the UTF8 but this extra line was the reason I could save my content into the database. So be sure you add characterEncoding=UTF8 to your connection string like below

jdbc:mysql://127.0.0.1:3306/my_daabase?characterEncoding=UTF8

For JSP pages with enctype="multipart/form-data" you will need to do one extra step. When you read a FileItem by getString method be sure you change it to getString("UTF-8") then that should do fine.

Upvotes: 2

akarnokd
akarnokd

Reputation: 70017

Edit4 (the final and corrected answer as requested)

Your servlet filter gets applied too late.

A possible proper order would be in web.xml as follows

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE web-app
    PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN"
    "http://java.sun.com/j2ee/dtds/web-app_2.3.dtd">

<web-app>
    <!--CharsetFilter start--> 
    <filter>
        <filter-name>Charset Filter</filter-name>
        <filter-class>CharsetFilter</filter-class>
        <init-param>
            <param-name>requestEncoding</param-name>
            <param-value>UTF-8</param-value>
        </init-param>
    </filter>
    <!-- The rest is ommited -->

Upvotes: 6

Related Questions