Micah
Micah

Reputation: 116090

How do I escape some html in javascript?

Given the text

<b>This is some text</b>

I want to write it to my page so that it shows up like this:

<b>This is some text</b>

and not like this

This is some text

using escape("<b>This is some text</b>") gives me this lovely gem in firefox

%3Cb%3EThis%20is%20some%20text%3C/b%3E

not exaclty what I'm after. Any ideas?

Upvotes: 36

Views: 59851

Answers (9)

Paolo
Paolo

Reputation: 15827

I use the following function that escapes every character with the &#nnn; notation except a-z A-Z 0-9 and space.

You can try with Escape('<b>This is some \'text\'</b>'):

function Escape(s) {
  var h,
      i,
      n,
      c;

  n = s.length;
  h = '';

  for (i = 0; i < n; i++) {
    c = s.charCodeAt(i);
    if ((c >= 48 && c <= 57) ||
      (c >= 65 && c <= 90) ||
      (c >= 97 && c <= 122) ||
      (c == 32)) {
      h += String.fromCharCode(c);
    } else {
      h += '&#' + c + ';';
    }
  }

  return h;
}

console.log(Escape('<b>This is some \'text\'</b>'))

Note that single and double quotes are properly escaped.

The function is code injection attacks proof, unicode proof, pure JavaScript.

This approach is about 50 times slower than the one that creates the DOM text node but still the funcion escapes a one milion (1,000,000) characters string in 100-150 milliseconds.

(Tested on early 2011 MacBook Pro - Safari 9 - Mavericks)

Upvotes: 0

user2226755
user2226755

Reputation: 13167

I think you should change the way to do it, don't try to escape HTML to use innerHTML, it is wrong. You should create an element with createElement and use innerText to add an insecure input.

Solution for Vanilla JavaScript in a DOM environment

Instead of:

// vulnerable
const html = "<b>Hello World!</b>"
const element = `<div>${html}</div>`

document.body.innerHTML = element 

You should do:

// secure
const html = '<b>Hello World!</b>'
const element = document.createElement('div')
element.innerText = html 

document.body.appendChild(element)

Upvotes: 0

limc
limc

Reputation: 40168

This should work for you: http://blog.nickburwell.com/2011/02/escape-html-tags-in-javascript.html

function escapeHTML( string )
{
    var pre = document.createElement('pre');
    var text = document.createTextNode( string );
    pre.appendChild(text);
    return pre.innerHTML;
}

Security Warning

This function only escape HTML tag and doesn't escape single and double quotes, which if used in the wrong context, may still lead to XSS. For example:

 // >> ⚠️ WARNING: VULNERABILITY SHOWCASE WITH SINGLE AND DOUBLE QUOTE
 var userWebsite = '" onmouseover="alert(\'gotcha\')" "';
 var profileLink = '<a href="' + escapeHtml(userWebsite) + '">Bob</a>';
 // << DON'T FOLLOW THIS EXAMPLE
 
 var div = document.getElemenetById('target');
 div.innerHtml = profileLink;
 // <a href="" onmouseover="alert('gotcha')" "">Bob</a>

Thanks to buffer for pointing out this case. Snippet taken out of this blog post.

Upvotes: 61

Stephen Quan
Stephen Quan

Reputation: 25956

I like @limc's answer for situations where the HTML DOM document is available.

I like @Michele Bosi's and @Paolo's answers for non HTML DOM document environment such as Node.js.

@Michael Bosi's answer can be optimized by removing the need to call replace 4 times with a single invocation of replace combined with a clever replacer function:

function escape(s) {
    let lookup = {
        '&': "&amp;",
        '"': "&quot;",
        '\'': "&apos;",
        '<': "&lt;",
        '>': "&gt;"
    };
    return s.replace( /[&"'<>]/g, c => lookup[c] );
}
console.log(escape("<b>This is 'some' text.</b>"));

@Paolo's range test can be optimized with a well chosen regex and the for loop can be eliminated by using a replacer function:

function escape(s) {
    return s.replace(
        /[^0-9A-Za-z ]/g,
        c => "&#" + c.charCodeAt(0) + ";"
    );
}
console.log(escape("<b>This is 'some' text</b>"));

As @Paolo indicated, this strategy will work for more scenarios.

Upvotes: 44

Dave Brown
Dave Brown

Reputation: 939

You can encode all characters in your string:

function encode(e){return e.replace(/[^]/g,function(e){return"&#"+e.charCodeAt(0)+";"})}

Or just target the main characters to worry about (&, inebreaks, <, >, " and ') like:

function encode(r){
return r.replace(/[\x26\x0A\<>'"]/g,function(r){return"&#"+r.charCodeAt(0)+";"})
}

test.value=encode('Encode HTML entities!\n\n"Safe" escape <script id=\'\'> & useful in <pre> tags!');

testing.innerHTML=test.value;

/*************
* \x26 is &ampersand (it has to be first),
* \x0A is newline,
*************/
<textarea id=test rows="9" cols="55"></textarea>

<div id="testing">www.WHAK.com</div>

Upvotes: 1

Michele Bosi
Michele Bosi

Reputation: 317

I ended up doing this:

function escapeHTML(s) { 
    return s.replace(/&/g, '&amp;')
            .replace(/"/g, '&quot;')
            .replace(/</g, '&lt;')
            .replace(/>/g, '&gt;');
}

Upvotes: 26

Zenexer
Zenexer

Reputation: 19613

Traditional Escaping

If you're using XHTML, you'll need to use a CDATA section. You can use these in HTML, too, but HTML isn't as strict.

I split up the string constants so that this code will work inline on XHTML within CDATA blocks. If you are sourcing your JavaScript as separate files, then you don't need to bother with that. Note that if you are using XHTML with inline JavaScript, then you need to enclose your code in a CDATA block, or some of this will not work. You will run into odd, subtle errors.

function htmlentities(text) {
    var escaped = text.replace(/\]\]>/g, ']]' + '>]]&gt;<' + '![CDATA[');
    return '<' + '![CDATA[' + escaped + ']]' + '>';
}

DOM Text Node

The "proper" way to escape text is to use the DOM function document.createTextNode. This doesn't actually escape the text; it just tells the browser to create a text element, which is inherently unparsed. You have to be willing to use the DOM for this method to work, however: that is, you have use methods such as appendChild, as opposed to the innerHTML property and similar. This would fill an element with ID an-element with text, which would not be parsed as (X)HTML:

var textNode = document.createTextNode("<strong>This won't be bold.  The tags " +
    "will be visible.</strong>");
document.getElementById('an-element').appendChild(textNode);

jQuery DOM Wrapper

jQuery provides a handy wrapper for createTextNode named text. It's quite convenient. Here's the same functionality using jQuery:

$('#an-element').text("<strong>This won't be bold.  The tags will be " +
    "visible.</strong>");

Upvotes: 3

Headshota
Headshota

Reputation: 21449

Try this htmlentities for javascript

function htmlEntities(str) {
    return String(str).replace(/&/g, '&amp;').replace(/</g, '&lt;').replace(/>/g, '&gt;').replace(/"/g, '&quot;');
}

Upvotes: 6

meouw
meouw

Reputation: 42140

Here's a function that replaces angle brackets with their html entities. You might want to expand it to include other characters too.

function htmlEntities( html ) {
    html = html.replace( /[<>]/g, function( match ) {
        if( match === '<' ) return '&lt;';
        else return '&gt;';
    });
    return html;
}

console.log( htmlEntities( '<b>replaced</b>' ) ); // &lt;b&gt;replaced&lt;/b&gt;

Upvotes: 1

Related Questions