Reputation: 6339
I'm making RESTful service using Jersey, which produce UTF-8 encoded replies. Here is a code snippet:
public static class Data {
private String value;
public Data(String value) {
this.value = value;
}
public String getValue() {
return value;
}
public void setValue(String value) {
this.value = value;
}
}
@GET
@Produces(MediaType.APPLICATION_JSON)
public Response method() {
Data response = new Data("€");
return Response.status(Response.Status.OK)
.type(MediaType.APPLICATION_JSON + ";charset=UTF-8")
.entity(response)
.build();
}
It's supposed to produce the following reply:
{"value":"€"}
or as byte array:
[123, 34, 118, 97, 108, 117, 101, 34, 58, 34, -30, -126, -84, 34, 125]
Note, that Euro sign is encoded as three bytes -30, -126, -84 or 0xe2 0x82 0xac.
However, it produces the following response
{"value":"â¬"}
or as byte array:
[123, 34, 118, 97, 108, 117, 101, 34, 58, 34, -61, -94, -62, -126, -62, -84, 34, 125]
Note, that Euro sign is encoded as six bytes now -61, -94, -62, -126, -62, -84 or 0xc3 0xa2 0xc2 0x82 0xc2 0xac.
I've found a conversion sequence, which results in such corruption, at some point UTF-8 encoded data is treated as Latin1 encoded data.
Data data = new Data("€");
org.codehaus.jackson.map.ObjectMapper mapper
= new org.codehaus.jackson.map.ObjectMapper();
try {
String strData = mapper.writeValueAsString(data);
System.out.println(strData);
byte[] rawData = mapper.writeValueAsBytes(data);
System.out.println(Arrays.toString(rawData));
String asLatin1 = new String(rawData, "ISO-8859-1");
byte[] brokenUtf8 = asLatin1.getBytes("UTF-8");
System.out.println(Arrays.toString(brokenUtf8));
} catch (IOException e) {
System.out.println("Fail " + e.getMessage());
}
The service runs on two machines one with apache-tomcat-7.0.30 and another in apache-tomcat-7.0.23. The former produces correct UTF-8 response, while the latter has corrupted UTF-8. I'm unable to find out what causes difference in behavior and what could resolve the problem.
Upvotes: 1
Views: 731
Reputation: 6339
The problem had a very sad reason and it was very hard to find. Ant's javac task had explicit encoding set:
<javac destdir="${classes}" includeantruntime="false" source="1.6" target="1.6" debug="true" encoding="ISO-8859-1" classpathref="main.classpath">
It was working under one Tomcat because it was built with Eclipse and another deployment was built with Ant, corrupting all Unicode characters.
Upvotes: 1
Reputation:
If it works in 7.0.30 and not in 7.0.23 perhaps it's a bug that was found and fixed? Have you checked out the Tomcat changelog to see if there is anything in there?
Upvotes: 0