Reputation: 4262
On the webapp server when I try encoding "médicaux_Jérôme.txt
" using java.net.URLEncoder
it gives following string:
me%CC%81dicaux_Je%CC%81ro%CC%82me.txt
While on my backend server when I try encoding the same string it gives following:
m%C3%A9dicaux_J%C3%A9r%C3%B4me.txt
Can someone help me understanding the different output for the same input? Also how can I get standardized output each time I decode the same string?
Upvotes: 3
Views: 1617
Reputation: 135762
The outcome depends on the platform, if you don't specify it.
See the java.net.URLEncoder
javadocs:
encode(String s)
Deprecated.
The resulting string may vary depending on the platform's default encoding. Instead, use the
encode(String,String)
method to specify the encoding.
So, use the suggested method and specify the encoding:
String urlEncodedString = URLEncoder.encode(stringToBeUrlEncoded, "UTF-8")
About different representations for the same string, if you specified "UTF-8"
:
The two URL encoded strings you gave in the question, although differently encoded, represent the same unencoded value, so there is nothing inherently wrong there. By writing both in a decode tool, we can verify that they are the same.
This is due, as we are seeing in this case, to the fact that there are multiple ways to URL encode the same string, specially if they have acute accents (due to the combining acute accent, precisely what happens in your case).
To your case, specifically, the first string encoded é
as e
+ ´
(latin small letter e + combining acute accent) resulting in e%CC%81
. The second encoded é
directly to %C3%A9
(latin small letter e with acute - two %
because in UTF-8 it takes two bytes).
Again, there is no problem with either representation. Both are forms of Unicode Normalization. It is known that Mac OS Xs tend to encode using the combining acute accent; in the end, it is a matter of preference of the encoder. In your case, there must be different JREs or, if that file name was user generated, then the user may have used a different OS (or tool) that generated that encoding.
Upvotes: 4