Reputation: 164297
Play refuses to accept a POST request when the data is unicode and I get:
Error parsing application/x-www-form-urlencoded
I was under the impression that everything is working great until I tried a request with text in Hebrew instead of English, so a request with
value=hey
works fine but a request with
value=%u05D4%u05D9%u05D9
fails.
I found something about it but he said he made it worked by changing play/api/mvc/ContentType.scala, something I'd like to avoid.
Any ideas?
Thanks!
I'm aware that the encoding does not fit the standards for application/x-www-form-urlencoded
but that's the case I need to deal with, changing the client side currently is not an option and it uses the javascript escape
method.
I'm looking for a solution on the backend side of things, that is a Play solution.
It would be nice to find a solution which can be implemented in java, but for now it looks like the solution is to write my own BodyParser (in scala).
Upvotes: 1
Views: 2268
Reputation: 719238
According to my research, the correct way to handle Unicode in a application/x-www-form-urlencoded
body is to translate the Unicode to bytes in the document's default charset (i.e. UTF-8) and then URL-encode the bytes (i.e. %-encode).
Certainly what you are currently doing (with '%uxxxx' sequences) is not a valid encoding as far as the specifications are concerned. (You can't just pull stuff out of the air like that ... and expect it to work.)
References:
Wikipedia: http://en.wikipedia.org/wiki/Percent-encoding#The_application.2Fx-www-form-urlencoded_type
HTML spec: http://www.w3.org/TR/html5/forms.html#application/x-www-form-urlencoded-encoding-algorithm . This gives the algorithm that a browser is supposed to use. If you do / produce something analogous, you should be fine.
I note that you discovered this escape syntax via your browser's console. Here's what the MSDN says about the Javascript escape()
method:
"The escape and unescape functions do not work properly for non-ASCII characters and have been deprecated. In JavaScript 1.5 and later, use encodeURI, decodeURI, encodeURIComponent, and decodeURIComponent."
I think that "do not work properly" means that they use a non-standard escaping syntax that browsers don't recognize. Lesson: read the spec rather than relying on experiments.
Upvotes: 1