divz
divz

Reputation: 7967

UTF-8 Issue in xml parsing

I am using the following codes to convert XML contents to UTF-8, but they are not working properly:

1.

InputStream is = new ByteArrayInputStream(strXMLAlert.getBytes("UTF-8"));
Document doc = db.parse(is); 

2.

InputSource is = new InputSource(new ByteArrayInputStream(strXMLAlert.getBytes()));
is.setCharacterStream(new StringReader(strXMLAlert));
is.setEncoding("UTF-8");
Document doc = db.parse(is);

Upvotes: 4

Views: 18571

Answers (1)

Mike Mansell
Mike Mansell

Reputation: 111

We probably need a bit more information to answer the question properly. For example, what problem are you seeing? Which Java version are you running?

However, expanding your first example to

DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
String strXMLAlert = "<a>永</a>";
InputStream is = new ByteArrayInputStream(strXMLAlert.getBytes("UTF-8"));
Document document = db.parse(is);
Node item = document.getDocumentElement().getChildNodes().item(0);
String nodeValue = item.getNodeValue();
System.out.println(nodeValue);

In this example, there is a Chinese character in the string. It successfully prints out

Your second example should also work, although you are providing the content twice. Either provide it as a set of bytes and provide the encoding, or just provide it as characters (the StringReader) and you don't need the encoding (since as characters, it's already been decoded from bytes to characters).

Upvotes: 7

Related Questions