Gustavo Fava
Gustavo Fava

Reputation: 73

Is there anything wrong with requiring the client to specify the charset in the http content-type header field?

I'm implementing a service (as rest) that receives a POST method.

The encoding in my sistem is UTF-8.

I'm using jboss 5, in which the servlet that receives the requests follows the HTTP 1.1 specification of rfc2068 which states that:

When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP.

so when the client that invokes my service is using for example UTF-8 and doesn't specify a charset, and the body of the POST contains characters outside the US-ASCII, the Jboss servlet assumes "ISO-8859-1" and does a "wrong" decodification and in my system i receive "broken" characters. For example instead of the string "día" i receive "dÂa".

The approach i found for "protecting" my system is to require the client to specify the charset in the content-type parameter. If a charset is not specified then i respond with an http 403 and a text indicating that "the charset value must be specified".

Is there anything wrong with this approach?

Upvotes: 0

Views: 78

Answers (1)

Julian Reschke
Julian Reschke

Reputation: 42017

RFC 2068 has been obsoleted twice and really is irrelevant. You need to look at RFC 7231, which doesn't define a default anymore. This means that the default is governed by the definition of the media type.

For text/plain, this implies US-ASCII (as far as I remember), so clients that want to send non-ASCII characters really need to specify the charset.

Upvotes: 2

Related Questions