Reputation: 73
I'm implementing a service (as rest) that receives a POST method.
The encoding in my sistem is UTF-8.
I'm using jboss 5, in which the servlet that receives the requests follows the HTTP 1.1 specification of rfc2068 which states that:
When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP.
so when the client that invokes my service is using for example UTF-8 and doesn't specify a charset, and the body of the POST contains characters outside the US-ASCII, the Jboss servlet assumes "ISO-8859-1" and does a "wrong" decodification and in my system i receive "broken" characters. For example instead of the string "día" i receive "dÂa".
The approach i found for "protecting" my system is to require the client to specify the charset in the content-type parameter. If a charset is not specified then i respond with an http 403 and a text indicating that "the charset value must be specified".
Is there anything wrong with this approach?
Upvotes: 0
Views: 78
Reputation: 42017
RFC 2068 has been obsoleted twice and really is irrelevant. You need to look at RFC 7231, which doesn't define a default anymore. This means that the default is governed by the definition of the media type.
For text/plain, this implies US-ASCII (as far as I remember), so clients that want to send non-ASCII characters really need to specify the charset.
Upvotes: 2