Jakob Jenkov
Jakob Jenkov

Reputation: 352

Java: String UTF-8 encoding difference between local app and Google App Engine

I am trying to get a Google App Engine web app to send back UTF-8 encoded text to the browser. I do this, by writing this:

byte[] utf8Bytes = "æøå".getBytes("UTF-8");

When I do this locally, I get a byte array with 6 bytes back. When I do this on Google App Engine, I get an array with 12 bytes back. Weird, eh?

Does anyone know why?

I have succeeded in writing UTF-8 encoded text from GAE, by encoding the bytes myself, and write the raw bytes back. Like this:

output.write(new byte[]{(byte)0xc3, (byte)0xa5, (byte) 0xc3, (byte)0xa6, (byte)0xc3, (byte)0xb8 });

And this actually works. But, does anyone have an answer to why the String's are encoded differently on GAE, than they are locally?

Note: Encoding the characters via unicode escapes worked - like this:

byte[] utf8Bytes = "\u00E5\u00F8\u00E6".getBytes("UTF-8");

Upvotes: 4

Views: 1511

Answers (2)

Dr. Max Völkel
Dr. Max Völkel

Reputation: 1859

Are you sure you have set the content-encoding in your HttpServletResponse before getting a Writer?

Upvotes: 1

jarnbjo
jarnbjo

Reputation: 34323

The bytes you are getting from GAE makes me assume that the source code file with the "æøå" literal is saved as UTF-8, but compiled with a compiler which is expecting the source files to be encoded as ISO-8859-1, ISO-8859-15 or Cp1252.

If you are building your source code with Ant or Maven, you have to specify the source file encoding in your build.xml or pom.xml.

Upvotes: 5

Related Questions