Guillaume
Guillaume

Reputation: 1879

How to set request encoding in Tomcat?

I have a problem in my Java webapp.

Here is the code in index.jsp:

<%@page contentType="text/html" pageEncoding="UTF-8" %>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
   "http://www.w3.org/TR/html4/loose.dtd">

<% request.setCharacterEncoding("UTF-8");
response.setCharacterEncoding("UTF-8");
%>

<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
        <title>JSP Page</title>
    </head>
    <body>
        <h1>Hello World!</h1>

        <form action="index.jsp" method="get">
            <input type="text" name="q"/>
        </form>

        Res: <%= request.getParameter("q") %>
    </body>
</html>

When I wireshark a request, my browser sends this header:

GET /kjd/index.jsp?q=%C3%A9 HTTP/1.1\r\n
...
Accept-Charset: UTF-8,*\r\n

And the Tomcat server returns me this:

Content-Type: text/html;charset=UTF-8\r\n

But if I send "é"(%C3%A9 in UTF-8) in my form, "é" is displayed instead.

What I understand is that the browser sends an "é" encoded with UTF-8 (the %C3%A9).

But the server interpret this as ISO-8859-1. So the %C3 is decoded as à and %A9 as ©, and then sends back the response encoded in UTF-8.

In the code, the requests should be decoded with UTF-8:

request.setCharacterEncoding("UTF-8");

But, if I send this url:

http://localhost:8080/kjd/index.jsp?q=%E9

the "%E9" is decocded with ISO-8859-1 and an "é" is displayed.

Why isn't this working? Why requests are decoded with ISO-8859-1?

I've tried it on Tomcat 6 and 7, and on Windows and Ubuntu.

Upvotes: 25

Views: 68737

Answers (2)

Divyesh Kanzariya
Divyesh Kanzariya

Reputation: 3789

you just need to uncomment below portion of code in conf/web.xml (Tomcat server web.xml) that filter all request and convert into UTF-8.

 <!-- A filter that sets character encoding that is used to decode -->
 <!-- parameters in a POST request -->
 <filter>
        <filter-name>setCharacterEncodingFilter</filter-name>
        <filter-class>org.apache.catalina.filters.SetCharacterEncodingFilter</filter-class>
        <init-param>
            <param-name>encoding</param-name>
            <param-value>UTF-8</param-value>
        </init-param>
 </filter>

  <!-- The mapping for the Set Character Encoding Filter -->
  <filter-mapping>
        <filter-name>setCharacterEncodingFilter</filter-name>
        <url-pattern>/*</url-pattern>
  </filter-mapping>

that's it. work fine in tomcat

Upvotes: 11

BalusC
BalusC

Reputation: 1109715

The request.setCharacterEncoding("UTF-8"); only sets the encoding of the request body (which is been used by POST requests), not the encoding of the request URI (which is been used by GET requests).

You need to set the URIEncoding attribute to UTF-8 in the <Connector> element of Tomcat's /conf/server.xml to get Tomcat to parse the request URI (and the query string) as UTF-8. This indeed defaults to ISO-8859-1. See also the Tomcat HTTP Connector Documentation.

<Connector ... URIEncoding="UTF-8">

or to ensure that the URI is parsed using the same encoding as the body1:

<Connector ... useBodyEncodingForURI="true">

See also:


1 From Tomcat's documentation (emphasis mine):

This setting is present for compatibility with Tomcat 4.1.x, where the encoding specified in the contentType, or explicitly set using Request.setCharacterEncoding method was also used for the parameters from the URL. The default value is false.


Please get rid of those scriptlets in your JSP. The request.setCharacterEncoding("UTF-8"); is called at the wrong moment. It would be too late whenever you've properly used a Servlet to process the request. You'd rather like to use a filter for this. The response.setCharacterEncoding("UTF-8"); part is already implicitly done by pageEncoding="UTF-8" in top of JSP.

I also strongly recommend to replace the old fashioned <%= request.getParameter("q") %> scriptlet by EL ${param.q}, or with JSTL XML escaping ${fn:escapeXml(param.q)} to prevent XSS attacks.

Upvotes: 59

Related Questions