archik
archik

Reputation: 455

Wrong Content-Length for response text with umlaut

There is a problem associated with umlaut. I get description on request:

@RequestMapping(value = "/description", method = RequestMethod.POST, consumes = "application/json", produces = "text/plain;charset=UTF-8")
    @ResponseBody
    private String getDescription() {        

        return "ärchik";
    }

on frontend response.responseText fails to score the last letter response.responseText = "ärchi"

i found that the problem in the wrong Content-Length: 7 if set Content-Length:8, then it will work and return full description "ärchik"

But i do not understand why 8?

"ärchik".getBytes("UTF-8").length = 7

Response Headers

Cache-Control:must-revalidate

Content-Length:7

Content-Type:text/plain;charset=utf-8

Date:Mon, 14 Apr 2014 09:08:26 GMT

Server:Apache-Coyote/1.1

Upvotes: 4

Views: 1645

Answers (3)

MvG
MvG

Reputation: 60858

I'm turning the core of my comment into an answer, since it seems I was on the right track.

The most likely reason for the string to be one byte longer than expected is that the 'ä' got encoded as three bytes not two. This can happen if one uses not the precomposed codepoint U+00E4 (UTF-8: c3 a4) but instead the letter 'a' (which is a simple ASCII letter at U+0061) followed by the combining diaresis U+0308, together encoded as 61 cc 88. There are several normal forms for Unicode, and the longer encoding would usually be the result of conversion to NFD.

Looking at your own answer, it seems you did just that normalization, at a point where the content length already was determined from the un-normalized (or perhaps NFC-normalized) string.

Upvotes: 5

archik
archik

Reputation: 455

It's my fault (( I working out the filter

//set content-length = 7    
chain.doFilter(request, wrappedResponse); 
byte[] bytes = wrappedResponse.getByteArray(); 
String out = new String(bytes, utf8Charset);//7bytes 
out = Normalizer.normalize(out , Normalizer.Form.NFD);//8bytes

Upvotes: 2

Cheung
Cheung

Reputation: 174

spring/tomcat response is right.

  response.responseText is Ajax response Object?

I guess: js file encoding not UTF-8 ; some function is not work for UTF-8 of javascript.

Upvotes: 0

Related Questions