Reputation: 7967
I am using the following codes to convert XML contents to UTF-8, but they are not working properly:
1.
InputStream is = new ByteArrayInputStream(strXMLAlert.getBytes("UTF-8"));
Document doc = db.parse(is);
2.
InputSource is = new InputSource(new ByteArrayInputStream(strXMLAlert.getBytes()));
is.setCharacterStream(new StringReader(strXMLAlert));
is.setEncoding("UTF-8");
Document doc = db.parse(is);
Upvotes: 4
Views: 18571
Reputation: 111
We probably need a bit more information to answer the question properly. For example, what problem are you seeing? Which Java version are you running?
However, expanding your first example to
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
String strXMLAlert = "<a>永</a>";
InputStream is = new ByteArrayInputStream(strXMLAlert.getBytes("UTF-8"));
Document document = db.parse(is);
Node item = document.getDocumentElement().getChildNodes().item(0);
String nodeValue = item.getNodeValue();
System.out.println(nodeValue);
In this example, there is a Chinese character in the string. It successfully prints out
永
Your second example should also work, although you are providing the content twice. Either provide it as a set of bytes and provide the encoding, or just provide it as characters (the StringReader) and you don't need the encoding (since as characters, it's already been decoded from bytes to characters).
Upvotes: 7