Reputation: 2978
I need to validate HTML user input in a web App using JavaScript.
What I did so far based on this question: I'm using third party library, sanitize-html, to sanitize input and then compare it to original one. If they are different, Html is invalid.
const isValidHtml = (html: string): boolean => {
let sanitized = sanitizeHtml(html, sanitizationConfig);
sanitized = sanitized.replace(/\s/g, '').replace(/<br>|<br\/>/g, ''); // different browser's behavior for <br>
html = html.replace(/\s/g, '').replace(/<br>|<br\/>/g, '');
return sanitized === html;
}
The above method works fine with unescaped Html but not with escaped ones.
isValidHtml('<'); // false
isValidHtml('<'); // true
isValidHtml('<script>'); // false
isValidHtml('<script>'); // true, this should be false also!!!
EDIT: As suggested by @brad in the comments, I tried to decode Html first:
decodeHtml(html: string): string {
const txt = document.createElement('textarea');
txt.innerHTML = html;
const decodedHtml = txt.value;
txt.textContent = null;
return decodedHtml;
}
and then call isValid(decodedHtml)
, I got this result:
isValidHtml('<'); // false
isValidHtml('<'); // false, this should be true!!!
isValidHtml('<script>'); // false
isValidHtml('<script>'); // false
Upvotes: 2
Views: 466
Reputation: 163301
If you're not actually trying to validate the HTML, and are simply trying to ensure it ends up being valid, I would recommend running it through the DOM parser and getting the HTML back out, effectively letting the browser do the work for you.
Untested, but something like this:
const parser = new DOMParser();
const doc = parser.parseFromString(html, 'text/html');
console.log(doc.documentElement.innerHTML);
Basically, you use the browser's built-in parsing to handle any errors, in the standard way that it does anyway. It will create a tree of nodes. From that tree of nodes, you generate HTML that is guaranteed to be valid.
See also: https://developer.mozilla.org/en-US/docs/Web/API/DOMParser#Parsing_an_SVG_or_HTML_document
Upvotes: 2