Reputation: 176

How to access innerHTML but ignore <script> tags

I have a plugin that looks through the HTML and replaces text. However, with my current implementation text in script tags are getting caught in the search as well. This leads to broken scripts on the affected pages.

var pageText = document.body.innerHTML;
document.body.innerHTML = pageText.replace(regextgoeshere);

I tried my best to filter it out through my regex pattern but I need to figure out how to skip all tags.

Is there a way to skip all tags when getting innerHTML?

Upvotes: 2

Answers (4)

Jayanta

Reputation: 145

I think we tend to think elements and miss nodes! However this problem is best solved by thinking nodes.

Australian Alex has best solution http://blog.alexanderdickson.com/javascript-replacing-text

function myRecursiveSearch(node,.....) {

   var excludeElements = ['script', 'style', 'iframe', 'canvas'];

   var child = node.firstChild;

   if(child==null)
     return;

    do {
        switch (child.nodeType) {

        case 1:
            if (excludeElements.indexOf(child.tagName.toLowerCase()) > -1) {
                continue;
            }

            myRecursiveSearch(child,.....);
            break;

        case 3:
           child.nodeValue=doReolace(child.nodeValue,.....);
           break;

        }

    } while (child = child.nextSibling);

}


function doTranslit(strtext,....) {
   .....
   return strtext;
}

Upvotes: 1

rnrneverdies

Reputation: 15627

Maybe your best option is to use querySelectorAll and negate undesired elements. Then replace the textContent instead innerHTML. By using innerHTML you risk breaking document tags.

This is a cross-browser solution.

var matches = document.querySelectorAll("*:not(html):not(head):not(script):not(meta):not(link)");
console.log(matches);
[].forEach.call(matches, function(elem) {
  var text = ('innerText' in elem) ? 'innerText' : 'textContent';
  elem[text] = elem[text].replace("this", "works");
});

http://jsfiddle.net/m6qhuesv/

Note 1: HTML, HEAD, META and LINK tags disallow modify textContext.

Note 2: innerText is a proprietary IE thing (also works in chrome). The W3C defines textContent as the official property.

Upvotes: 3

DeeDee

Reputation: 2751

EDIT: I misunderstood your requirements

If you want something more sophisticated, try Douglas Crockford's walking the DOM function:

function walkTheDOM(node, func) {
    func(node);
    node = node.firstChild;
    while (node) {
        walkTheDOM(node, func);
        node = node.nextSibling;
    }
}

You can use the tagName property of node to skip <script> elements:

if(node.tagName.toLowerCase() !== 'script'){
    node.innerText = node.innerText.replace(regextgoeshere);
}

Upvotes: 1

AtanuCSE

Reputation: 8940

Didn't check but you can try.

var pageText = document.body.innerHTML;
mypagewithoutScriptTag = pageText.replace(<script>(.*?)</script>);

Upvotes: -1

How to access innerHTML but ignore &lt;script&gt; tags

Answers (4)

Related Questions

How to access innerHTML but ignore <script> tags