zhaoch93
zhaoch93

Reputation: 93

how to decode a string(not bytes[]) in utf-8 format into another string in java?

for some reason, I have to decode string in chinese character. like this “\u961c”,this string is utf-8 of “阜”

I know how to decode bytes[] into Unicode characters.but is there an easy way decode String into Unicode characters?

By the way,When I get “阜”.getBytes. I get -100,-104,-23. Is that means

1001110 10010100 11101001 in binary?

But I think \u961c Unicode should be 1001 0110 0001 1100 in binary or something

and it's utf-8 format should be 11101001 10011000 10011100 in binary

Upvotes: 1

Views: 4982

Answers (1)

Isuru Rangana
Isuru Rangana

Reputation: 115

In Java, there is no such method to encode a String object (not entirely accurate, there is an encoding, but that's UTF-16).

The only way is to encode to a byte[]. So if you need UTF-8 data, then you need a byte[]. If you have a String that contains unexpected data, then the problem is at some earlier place that incorrectly converted some binary data to a String (i.e. it was using the wrong encoding).

This one will work, but for bytes[]

Charset.forName("UTF-8").encode(myString)

Upvotes: 1

Related Questions