Reputation: 3595
I've a web application (well, in fact is just a servlet) which receives data from 3 different sources:
<form method="get">
.<form method="get">
, too.<a href="http://my-servlet-url?param=value¶m2=value2&etc">
.The servlet receives the request params and URL-decodes them using UTF-8. As you can expect, A works without problems, while B and C fail (you can't URL-decode in UTF-8 something that's encoded in ISO-8859-1...).
I can make slight modifications to B and C, but I am not allowed to change them from ISO-8859-1 to UTF-8, which would solve all the problems.
In B, I've been able to solve the problem by adding accept-charset="UTF-8"
to the <form>
. So it sends the data in UTF-8 even with the page being ISO.
What can I do to fix C?
Alternatively, is there any way to determine the charset on the servlet, so I can call URL-decode with the right encoding in each case?
Edit: I've just found this, which seems to solve my problem. I still have to make some tests in order to determine if it impacts the perfomance, but I think I'll stick with that solution.
Upvotes: 1
Views: 4152
Reputation: 3595
I'm answering myself in order to mark the question as solved:
I found this question, which covers exactly the same problem I was facing. The javax.servlet.Filter
was the solution for me.
Upvotes: 0
Reputation: 1108692
The browser will by default send the data in the same encoding as the requested page was returned in. This is controllable by the HTTP Content-Type
header which you can also set using the HTML <meta>
tag.
The accept-charset
attribute of the HTML <form>
element should be avoided since it's broken in MSIE. Almost all non-UTF-8 encodings are ignored and will be sent in platform default encoding (which is usually CP-1252 in case of Windows).
To fix A and B (POST) you basically need to set HttpServletRequest#setCharacterEncoding()
before gathering request parameters. Keep in mind that this is an one-time task. You cannot get a parameter and then change the encoding and then "re-get" the parameters.
To fix C (GET) you basically need to set the request URI encoding in the server configuration. Since it's unclear which server you're using, here's a Tomcat-targeted example: in the HTTP connector set the following attribute:
<Connector (...) URIEncoding="ISO-8859-1" />
However, this is already the default encoding in most servers. So you maybe don't need to do anything for C.
As an alternative, you can grab the raw and un-URL-encoded data from the request body (in case of POST) by HttpServletRequest#getInputStream()
or from the query string (in case of GET) by HttpServletRequest#getQueryString()
and then guess the encoding yourself based on the characters available in the parameters and then URL-encode accordingly using the guessed encoding. A hidden input element with a specific character which is different in both UTF-8 and ISO-8859-1 may help a lot in this.
Upvotes: 3