Reputation: 554
Using Java I am constructing some XML. In the XML some nodes may have values which are in Korean language or some other language. After constructing, how do I make sure that my whole XML is in UTF-8 encoding? Do I need to explicitly change the string to UTF-8 by using something like:
string = new String(s.getBytes(), "UTF-8");
Or will the whole string be automatically in UTF-8?
Also if I get some XML with some UTF-8 like this <name>[B@19821f<name>
, how do I know that [B@19821f
is a UTF-8 of some Korean word?
Upvotes: 0
Views: 368
Reputation: 692151
A string contains characters. The encoding is irrelevant until you transform the string into bytes. This happens when you call String.getBytes()
, or when you write the String to a stream (file, socket, whatever).
Make sure you use an OutputStreamWriter
to write your XML string, and that you specify UTF-8 as charset when constructing this OutputStreamWriter
. If you're using a dedicated marshalling API like JAXB, set the appropriate property so that the UTF-8 encoding is used, and the generated XML contains its encoding (in the <?xml ...?>
header) . Without knowing which API you're using to generate your XML string, it's hard to be more helpful.
Upvotes: 1
Reputation: 308249
First: the code you posted to "change the string to UTF8" is wrong. You never want to use that (*).
If you parse XML (and the XML is correctly encoded) then you'll already get String
values in Java that will have the correctly decoded values, so there is nothing else you need to do, just handle the String
objects as normally.
(*) there are a few cases where you have to "undo" damage already done where this might be useful, but those cases are very rare and then it will usually not work correctly either.
Upvotes: 1