Reputation: 93
for some reason, I have to decode string in chinese character. like this “\u961c”,this string is utf-8 of “阜”
I know how to decode bytes[] into Unicode characters.but is there an easy way decode String into Unicode characters?
By the way,When I get “阜”.getBytes. I get -100,-104,-23. Is that means
1001110 10010100 11101001 in binary?
But I think \u961c Unicode should be 1001 0110 0001 1100 in binary or something
and it's utf-8 format should be 11101001 10011000 10011100 in binary
Upvotes: 1
Views: 4982
Reputation: 115
In Java, there is no such method to encode a String
object (not entirely accurate, there is an encoding, but that's UTF-16).
The only way is to encode to a byte[]
. So if you need UTF-8 data, then you need a byte[]
. If you have a String
that contains unexpected data, then the problem is at some earlier place that incorrectly converted some binary data to a String
(i.e. it was using the wrong encoding).
This one will work, but for bytes[]
Charset.forName("UTF-8").encode(myString)
Upvotes: 1