What is the advantage of using .innerHTML for escaping characters?

Question

I'm trying to figure out how to use escape characters in JS/HTML but I can't figure out how to do it. I've seen examples of .innerHTML being used but I don't understand how. Can someone please explain it in simple terms?

Pointy · Accepted Answer

If you add content as raw text (like, as the value of a text node), and then query the .innerHTML of the container, you get back escaped HTML because that's what it'd have to look like if you were to set the .innerHTML:

var d = document.createElement('span');
var t = document.createTextNode("Hello World");
d.appendChild(t);
console.log(d.innerHTML); // logs <b>Hello World</b>

It's just the way that the .innerHTML mechanism behaves.

According to the MDN documentation, the only characters that are affected are <, >, and &. There are times when it's useful to encode other characters with HTML entities. The most common situation I think is when you want to use quotes in an HTML attribute.

An alternative to using the browsers DOM behavior is to use your own JavaScript function. Here's a (slightly modified) version of the code use in the doT template library:

    function encodeHTMLSource() {
      var encodeHTMLRules = { 
          "&": "&", "<": "<", ">": ">", '"': '"', "'": ''', "/": '/'
        },
        matchHTML = /&(?!#?\w+;)|<|>|"|'|\//g;

        return function() {
          return this ? this.replace(matchHTML, function(m) {
            return encodeHTMLRules[m] || m;
          }) : this;
        };
    }
    String.prototype.encodeHTML = encodeHTMLSource();

This function is designed to be added to the String prototype, which some might find distasteful (that seems to be a recent change; my older version doesn't do this). The idea is that it uses a closure to keep a mapping from the "naughty" characters to their HTML entity equivalents, as well as a regular expression to find characters to convert. Once you've done the above, you can escape any string with:

  var escaped = "Hello World".encodeHTML();

The regular expression is written such that it avoids re-encoding existing HTML entities.

What is the advantage of using .innerHTML for escaping characters?

Answers (1)

Related Questions