Mhd
Mhd

Reputation: 2978

Validate input HTML using JavaScript

I need to validate HTML user input in a web App using JavaScript.

What I did so far based on this question: I'm using third party library, sanitize-html, to sanitize input and then compare it to original one. If they are different, Html is invalid.

const isValidHtml = (html: string): boolean => {
    let sanitized = sanitizeHtml(html, sanitizationConfig);
    sanitized = sanitized.replace(/\s/g, '').replace(/<br>|<br\/>/g, ''); // different browser's behavior for <br>
    html = html.replace(/\s/g, '').replace(/<br>|<br\/>/g, '');
    return sanitized === html;
}

The above method works fine with unescaped Html but not with escaped ones.

isValidHtml('<'); // false
isValidHtml('&lt;'); // true
isValidHtml('<script>'); // false
isValidHtml('&lt;script&gt;'); // true, this should be false also!!!
  1. Am I missing something with this method?
  2. Is there a better way to do this task?

EDIT: As suggested by @brad in the comments, I tried to decode Html first:

decodeHtml(html: string): string {
    const txt = document.createElement('textarea');
    txt.innerHTML = html;
    const decodedHtml = txt.value;
    txt.textContent = null;
    return decodedHtml;
} 

and then call isValid(decodedHtml), I got this result:

isValidHtml('<'); // false
isValidHtml('&lt;'); // false, this should be true!!!
isValidHtml('<script>'); // false
isValidHtml('&lt;script&gt;'); // false

Upvotes: 2

Views: 466

Answers (1)

Brad
Brad

Reputation: 163301

If you're not actually trying to validate the HTML, and are simply trying to ensure it ends up being valid, I would recommend running it through the DOM parser and getting the HTML back out, effectively letting the browser do the work for you.

Untested, but something like this:

const parser = new DOMParser();
const doc = parser.parseFromString(html, 'text/html');
console.log(doc.documentElement.innerHTML);

Basically, you use the browser's built-in parsing to handle any errors, in the standard way that it does anyway. It will create a tree of nodes. From that tree of nodes, you generate HTML that is guaranteed to be valid.

See also: https://developer.mozilla.org/en-US/docs/Web/API/DOMParser#Parsing_an_SVG_or_HTML_document

Upvotes: 2

Related Questions