Reputation: 4418
I have Search on my site we frame the query and send in the Request and Response comes back from the vendor as JSON. The vendor crawls our site and capture the data from our site and send response. In Our design we are converting the JSON into java object using GSON. We use the UTF-8 as charset in the Meta.
I have a situation the response has some times Unicode encoding for the special characters based on the request. The browser is rendering this Unicode encoding for special characters in a strange way. How should i decode this Unicode encoding?
For example, for the special character 'ndash' i see in the response it encoded as '\u2013'
Upvotes: 1
Views: 8140
Reputation: 49237
To clarify the differences between Unicode and a character encoding
Unicode
Character encoding
A java String is always UTF-16
. Hence when you construct a String you can use the following String constructor
new String(byte[], encoding)
The second argument should be the encoding the characters are in when the client are sending them. If you don't explicilty define an encoding, you will get the default system encoding, which you can examine using Charset.defaultCharset();
.
You can manually set the default encoding as an argument when starting the JVM
-Dfile.encoding="utf-8"
Although rarely needed, you can also employ CharsetDecoder/CharsetEncoder.
Upvotes: 5