Mateus Viccari
Mateus Viccari

Reputation: 7709

How to generate XML with correct character encoding?

I have this code:

DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();

Document doc = docBuilder.newDocument();
Element rootElement = doc.createElement("company");
doc.appendChild(rootElement);

Element staff = doc.createElement("Staff");
rootElement.appendChild(staff);

Attr attr = doc.createAttribute("id");
attr.setValue("1");
staff.setAttributeNode(attr);

Element firstname = doc.createElement("firstname");
firstname.appendChild(doc.createTextNode("† José do Capêta †"));
staff.appendChild(firstname);

TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
StringWriter writer = new StringWriter();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.transform(new DOMSource(doc), new StreamResult(writer));
String output = writer.getBuffer().toString();

System.out.println(output);

That must generate a XML file. This XML should have the characters as i wrote in the code, with the special characters and everything. But when i run, the outpus is this:

<company>
<Staff id="1">
<firstname>† José do Capêta †</firstname>
</Staff>
</company>

So, when i try to open it with any xml reader, it gives me an error because it can not read those special characters.

I know it should generate the following XML, i just don't know how to do it:

<company>
<Staff id="1">
<firstname>&#8224; Jos&#233; do Cap&#234;ta &#8224;</firstname>
</Staff>
</company>

So what is the solution for this?

Upvotes: 0

Views: 176

Answers (4)

JamesB
JamesB

Reputation: 7894

You should use ByteArrayOutputStream to write out the XML and then convert it to a String.

ByteArrayOutputStream bos = new ByteArrayOutputStream();
StreamResult result = new StreamResult(bos);
transformer.transform(new DOMSource(doc), result);
String output = bos.toString("UTF-8");

[Edit]

If you need to write the bytes to file, you can do the following:

try (FileOutputStream fos = new FileOutputStream("someName.xml")) {
  fos.write(bos.toByteArray());
} catch (IOException ioe) {
  ioe.printStackTrace();
}

Upvotes: 1

Augusto
Augusto

Reputation: 29857

If my memory serves me right, the problem is the method you're calling. Try using setTextContent

For example

firstname.setTextContent("† José do Capêta †");

That should escape the text automatically.

Upvotes: 0

jtahlborn
jtahlborn

Reputation: 53694

Don't generate your xml to a String or Writer. Generate the xml to bytes and write the bytes directly to a file (or generate directly to a file). By generating the xml to a String/Writer, you are generating the xml using the platform default character encoding. In general, you should use "utf-8" unless you have a really good reason not to (which is generally what the xml library will do on it's own if you write to a OutputStream instead of a Writer).

Upvotes: 1

Razor Wedner
Razor Wedner

Reputation: 33

You could use escapeXML from the org.apache.commons.lang.StringEscapeUtils library, So you could use:

String foo = StringEscapeUtils.escapeXML("& # shall be escaped!");

You can also use an external library.

Upvotes: -1

Related Questions