Lerin Sonberg
Lerin Sonberg

Reputation: 623

Inserting special HTML characters into XML

I produce an XML string with this expression in JavaScript:

var xml = '<xml xmlns="http://www.w3.org/1999/xhtml">' + dom.outerHTML + '</xml>'

(dom is some node in the document tree.)

Later I read this back with:

... = (new DOMParser).parseFromString(xml, "text/xml");

Usually it works fine, but fails when one of the fields in dom contains a non-breaking space character, typed manually with Alt+0160. In dom.outerHTML it appears as &nbsp;, but the parseFromString function returns this:

<xml xmlns="http://www.w3.org/1999/xhtml">
    <parsererror style="display: block; white-space: pre; border: 2px solid #c77; padding: 0 1em 0 1em; margin: 1em; background-color: #fdd; color: black">
        <h3>This page contains the following errors:</h3>
        <div style="font-family:monospace;font-size:12px">error on line 1 at column 139: Entity 'nbsp' not defined↵</div>
        <h3>Below is a rendering of the page up to the first error.</h3>
    </parsererror>
    ...
</xml>

(It is actually the function result, not an exception! Very strange solution (: .)

I've tried &amp;nbsp; too, this succeded without <parsererror> tag, but was read back as the "&nbsp;" string, not the UNICODE 160 code point.

Probably other HTML spec chars are affected too.

Where and how should I escape/replace the special HTML characters to get back exactly the same dom as the original?

Thanks in advance.

Upvotes: 4

Views: 371

Answers (1)

Lerin Sonberg
Lerin Sonberg

Reputation: 623

As @forty-two suggested, XMLSerializer solved the problem:

var xml = '<xml xmlns="http://www.w3.org/1999/xhtml">' 
  + (new XMLSerializer).serializeToString(dom) 
  + '</xml>'

This inserts the non-breaking space character directly into the result. (No '&' characters.) The read side needs no change. Thanks.

Upvotes: 1

Related Questions