Reputation: 265
Need to strip all web content from html file keeping only HTML Tags.
Could it be done by Regular Expression OR JavaScript ?
BEFORE :
<html>
<head>
<title>Ask a Question - Stack Overflow</title>
<link rel="shortcut icon" href="//cdn.sstatic.net/stackoverflow/img/favicon.ico">
<script type="text/javascript">
document.write("Code remains un-touched");
</script>
</head>
<body class="ask-page new-topbar">
<div id="first">ONE</div>
<div id="sec">TWO</div>
<div id="third">THREE</div>
</body>
</html>
AFTER :
<html>
<head>
<title></title>
<link rel="shortcut icon" href="//cdn.sstatic.net/stackoverflow/img/favicon.ico">
<script type="text/javascript">
document.write("Code remains un-touched");
</script>
</head>
<body class="ask-page new-topbar">
<div id="first"></div>
<div id="sec"></div>
<div id="third"></div>
</body>
</html>
UPDATE : Need to work with later HTML tags, after stripping web-content, the html should be displayed. In the end, i am interested in the HTML Code.
Upvotes: 0
Views: 67
Reputation: 54649
A simple recursive function would work:
(function removeTextNodes(el) {
Array.apply([], el.childNodes).forEach(function (child) {
if (child.nodeType === 3 && el.nodeName !== 'SCRIPT') {
// remove the text node
el.removeChild(child);
}
else if (child.nodeType === 1) {
// call recursive for child nodes
removeTextNodes(child);
}
});
})(document.documentElement);
Quoting Amadan: just use document.documentElement.outerHTML
to get the html as a string.
Upvotes: 3
Reputation: 198314
I'm thinking something like this should work:
$('*').each(function() {
$(this).contents().filter(function() {
return this.nodeType == 3 && this.parentNode.nodeName != 'SCRIPT';
}).remove();
});
Iterate over all elements, see all their child nodes, if they're text nodes and not inside script
, kill 'em.
You can test on this very page :P
(Yoshi's jQueryless script is faster, but this was shorter to write :P )
EDIT: nodeName
is in caps. Oops.
EDIT for OP's edit: This will subsequently fetch the source code:
$('html')[0].outerHTML
and you can display it using:
$('body').text($('html')[0].outerHTML)
EDIT again: Also, if you want it jQueryless, you can also do document.documentElement.outerHTML
instead (which is both faster and nicer). Works with Yoshi's solution, too.
Upvotes: 2