Marshmellow1328
Marshmellow1328

Reputation: 1265

Encoding xml using ascii encoding instead of character entities

Alright, so here is my issue. I need to generate xml in Java to pass onto another application. I started off thinking this would be easy using an org.w3c.dom.Document. Unfortunately the application I need to pass the XML off to requires that special characters like " need to be encoded as ASCII (") instead of their character entity ("). Does anybody know a simple solution to this?

P.S. Changing the target application is not an option.

Update: So let's say my app is given the following string as input:

he will "x" this if needed

My app needs to output this:

<field value="he will &#034;x&#034; this if needed"/>

The XML generator I am using and I am guessing most others output this but this is not valid for my target:

<field value="he will &quot;x&quot; this if needed"/>

I realize my target may not quite be up to XML standards, but that doesn't help me as I have no control over it. This is my situation and I have to deal with it. Any ideas other than simply converting every special character by hand?

Upvotes: 1

Views: 8172

Answers (2)

iter
iter

Reputation: 4313

I wonder how you serialize the XML--to a string, a stream, etc. You can post-process your output to replace general entity references with their numeric equivalents, e.g.,

sed 's/&lt;/\&#60;/g; s/&gt;/\&#62;/g; s/&amp;/\&#38;/g; s/&apos;/\&#39/g; s/&quot;/\&#34;/g'

or

xmlResultString.replaceAll("&lt;", "&#60;"); //etc. for other entities

There are exactly 5 pre-defined general entities in XML (http://www.w3.org/TR/REC-xml/#sec-predefined-ent) and you can safely perform this as a textual replacement. There is no danger that it modify anything except the references (well, maybe in comments and PIs, but it doesn't sound like your scenario uses them, or that the target even accepts them).

I agree with Mark that your target application is not a conforming XML processor. At least it comes with documentation that states explicitly where it diverges from XML. I believe the Recommendation (link above) disagrees with Christopher's comment, though it's irrelevant to OP's question as his target declares its non-conformance to the Recommendation.

Ari.

Upvotes: 2

McDowell
McDowell

Reputation: 108869

To my knowledge, the standard API doesn't expose the escape mechanism. You'd probably need to write your own XML emitter.

If you don't mind a 3rd party API, you could use JDOM. Something like:

XMLOutputter outputter = new XMLOutputter() {
  @Override
  public String escapeAttributeEntities(String sequence) {
    // TODO: bug: code only works for Basic Multilingual Plane
    StringBuilder out = new StringBuilder();
    for (int i = 0; i < sequence.length(); i++) {
      process(sequence.charAt(i), out);
    }
    return out.toString();
  }

  private void process(char codePoint, StringBuilder out) {
    if (codePoint == '"' || codePoint == '\'' || codePoint == '&'
        || codePoint == '<' || codePoint == '>' || codePoint > 127) {
      out.append("&#");
      out.append(Integer.toString(codePoint));
      out.append(";");
    } else {
      out.append(codePoint);
    }
  }
};
outputter.setFormat(Format.getPrettyFormat().setEncoding("US-ASCII"));

Element foo = new Element("foo").setAttribute("msg",
    "he will \"x\" this if needed");
Document doc = new Document().setRootElement(foo);
outputter.output(doc, System.out);

This emits:

<?xml version="1.0" encoding="US-ASCII"?>
<foo msg="he will &#34;x&#34; this if needed" />

(I'd still give the XML spec a once-over before doing this and fix up the character handling to support characters above U+FFFF.)

Upvotes: 1

Related Questions