Arun
Arun

Reputation: 645

Java URLEncode giving different results

I have this code stub:

System.out.println(param+"="+value);
param = URLEncoder.encode(param, "UTF-8");
value = URLEncoder.encode(value, "UTF-8");
System.out.println(param+"="+value);

This gives this result in Eclipse:

p=指甲油
p=%E6%8C%87%E7%94%B2%E6%B2%B9

But when I run the same code from command line, I get the following output:

p=指甲油
p=%C3%8A%C3%A5%C3%A1%C3%81%C3%AE%E2%89%A4%C3%8A%E2%89%A4%CF%80

What could be the problem?

Upvotes: 3

Views: 1057

Answers (1)

BalusC
BalusC

Reputation: 1108852

Your Mac was using Mac OS Roman encoding in the terminal. Those Chinese characters are incorrectly been interpreted using Mac OS Roman encoding instead of UTF-8 encoding before sending to Java.

As evidence, those Chinese characters exist in UTF-8 encoding of the following (hex) bytes:

Then check the Mac OS Roman codepage layout, those (hex) bytes represent the following characters:

  • 0xE6 0x8C 0x87 = Ê å á
  • 0xE7 0x94 0xB2 = Á î
  • 0xE6 0xB2 0xB9 = Ê π

Now, put them together and URL-encode them using UTF-8:

System.out.println(URLEncoder.encode("指甲油", "UTF-8"));

Look what it prints?

%C3%8A%C3%A5%C3%A1%C3%81%C3%AE%E2%89%A4%C3%8A%E2%89%A4%CF%80

To fix your problem, tell your Mac to use UTF-8 encoding in the terminal. Honestly, I can't answer that part off top of head as I don't do Mac. Your Eclipse encoding configuration is totally fine, but for the case that, you could configure it via Window > Preferences > General > Workspace > Text File Encoding.


Update: I missed a comment:

I am reading the value from a text file

If those variables are originating from a text file instead of from commandline input — as I initially expected —, then you need to solve the problem differently. Apparently, you was using a Reader implementation for that which is using the runtime environment's default character encoding like so:

Reader reader = new FileReader("/file.txt");
// ...

You should instead be explicitly specifying the desired encoding while creating the reader. You can do that with the InputStreamReader constructor.

Reader reader = new InputStreamReader(new FileInputStream("/file.txt"), "UTF-8");
// ...

This will explicitly tell Java to read /file.txt using UTF-8 instead of runtime environment's default encoding as available by Charset#defaultCharset().

System.out.println("This runtime environment uses as default charset " + Charset.defaultCharset());

Upvotes: 9

Related Questions