Reputation: 11
I'm having troubles taking the DOM document (or a node in it) and serialize it as properly formatted xml. I need to do this as the tools I will upload part of the document to understands only XML and not HTML with its improperly closed elements. As an example I'm currently scraping (amongst many) http://studentlund.se which showcases my problems with img elements not being closed.
For example if I execute the following in chromes console:
$('<div>').append($('body ul:first li:last')).html()
I'll receive:
<li><a href="http://studentlund.se/feed/"><img src="http://studentlund.se/wordpress/wp-
content/themes/studentlund/pics/rss.png" alt="RSS"></a></li>
The img element is not closed, thus my xml parser will fail.
If I use the XMLSerializer:
n = $('body ul:first li:last').get(0)
new XMLSerializer().serializeToString(n)
I will get the same, incorrectly formatted XML:
<li><a href="http://studentlund.se/feed/"><img src="http://studentlund.se/wordpress/wp-content/themes/studentlund/pics/rss.png" alt="RSS"></a></li>
All I want is being able to dump the RAW DOM of a node in a properly formatted string of XML so I can use it with my XML tools, is this possible?
Upvotes: 1
Views: 480
Reputation: 1839
Try to create an XML document and then serialize it to string, something like this:
n = $('body ul:first li:last').get(0);
var doc = document.implementation.createDocument('', '', null);
doc.appendChild(n);
var xml = new XMLSerializer().serializeToString(doc);
Upvotes: 1