Reputation: 176
I have a plugin that looks through the HTML and replaces text. However, with my current implementation text in script tags are getting caught in the search as well. This leads to broken scripts on the affected pages.
var pageText = document.body.innerHTML;
document.body.innerHTML = pageText.replace(regextgoeshere);
I tried my best to filter it out through my regex pattern but I need to figure out how to skip all tags.
Is there a way to skip all tags when getting innerHTML?
Upvotes: 2
Views: 2927
Reputation: 145
I think we tend to think elements and miss nodes! However this problem is best solved by thinking nodes.
Australian Alex has best solution http://blog.alexanderdickson.com/javascript-replacing-text
function myRecursiveSearch(node,.....) {
var excludeElements = ['script', 'style', 'iframe', 'canvas'];
var child = node.firstChild;
if(child==null)
return;
do {
switch (child.nodeType) {
case 1:
if (excludeElements.indexOf(child.tagName.toLowerCase()) > -1) {
continue;
}
myRecursiveSearch(child,.....);
break;
case 3:
child.nodeValue=doReolace(child.nodeValue,.....);
break;
}
} while (child = child.nextSibling);
}
function doTranslit(strtext,....) {
.....
return strtext;
}
Upvotes: 1
Reputation: 15627
Maybe your best option is to use querySelectorAll and negate undesired elements. Then replace the textContent instead innerHTML. By using innerHTML you risk breaking document tags.
This is a cross-browser solution.
var matches = document.querySelectorAll("*:not(html):not(head):not(script):not(meta):not(link)");
console.log(matches);
[].forEach.call(matches, function(elem) {
var text = ('innerText' in elem) ? 'innerText' : 'textContent';
elem[text] = elem[text].replace("this", "works");
});
Note 1: HTML, HEAD, META and LINK tags disallow modify textContext.
Note 2: innerText is a proprietary IE thing (also works in chrome). The W3C defines textContent as the official property.
Upvotes: 3
Reputation: 2751
EDIT: I misunderstood your requirements
If you want something more sophisticated, try Douglas Crockford's walking the DOM function:
function walkTheDOM(node, func) {
func(node);
node = node.firstChild;
while (node) {
walkTheDOM(node, func);
node = node.nextSibling;
}
}
You can use the tagName
property of node
to skip <script>
elements:
if(node.tagName.toLowerCase() !== 'script'){
node.innerText = node.innerText.replace(regextgoeshere);
}
Upvotes: 1
Reputation: 8940
Didn't check but you can try.
var pageText = document.body.innerHTML;
mypagewithoutScriptTag = pageText.replace(<script>(.*?)</script>);
Upvotes: -1