Reputation: 645
I have this code stub:
System.out.println(param+"="+value);
param = URLEncoder.encode(param, "UTF-8");
value = URLEncoder.encode(value, "UTF-8");
System.out.println(param+"="+value);
This gives this result in Eclipse:
p=指甲油
p=%E6%8C%87%E7%94%B2%E6%B2%B9
But when I run the same code from command line, I get the following output:
p=指甲油
p=%C3%8A%C3%A5%C3%A1%C3%81%C3%AE%E2%89%A4%C3%8A%E2%89%A4%CF%80
What could be the problem?
Upvotes: 3
Views: 1057
Reputation: 1108852
Your Mac was using Mac OS Roman encoding in the terminal. Those Chinese characters are incorrectly been interpreted using Mac OS Roman encoding instead of UTF-8 encoding before sending to Java.
As evidence, those Chinese characters exist in UTF-8 encoding of the following (hex) bytes:
指
= 0xE6 0x8C 0x87甲
= 0xE7 0x94 0xB2油
= 0xE6 0xB2 0xB9Then check the Mac OS Roman codepage layout, those (hex) bytes represent the following characters:
Ê
å
á
Á
î
≤
Ê
≤
π
Now, put them together and URL-encode them using UTF-8:
System.out.println(URLEncoder.encode("指甲油", "UTF-8"));
Look what it prints?
%C3%8A%C3%A5%C3%A1%C3%81%C3%AE%E2%89%A4%C3%8A%E2%89%A4%CF%80
To fix your problem, tell your Mac to use UTF-8 encoding in the terminal. Honestly, I can't answer that part off top of head as I don't do Mac. Your Eclipse encoding configuration is totally fine, but for the case that, you could configure it via Window > Preferences > General > Workspace > Text File Encoding.
Update: I missed a comment:
I am reading the value from a text file
If those variables are originating from a text file instead of from commandline input — as I initially expected —, then you need to solve the problem differently. Apparently, you was using a Reader
implementation for that which is using the runtime environment's default character encoding like so:
Reader reader = new FileReader("/file.txt");
// ...
You should instead be explicitly specifying the desired encoding while creating the reader. You can do that with the InputStreamReader
constructor.
Reader reader = new InputStreamReader(new FileInputStream("/file.txt"), "UTF-8");
// ...
This will explicitly tell Java to read /file.txt
using UTF-8 instead of runtime environment's default encoding as available by Charset#defaultCharset()
.
System.out.println("This runtime environment uses as default charset " + Charset.defaultCharset());
Upvotes: 9