Reputation: 29
I have a response from some semi-untrusted API, that is supposed to contain html. Now I want to to convert this to plaintext, basically strip out all the formatting so I can easily search it, then display (part of) it.
I have come up with this:
function convertHtmlToText(html) {
const div = document.createElement("div");
// assumpton: because the div is not part of the document
// - no scripts are executed
// - no layout pass
div.innerHTML = html;
// assumption: whitespace is still normalized
// assumption: this returns the text a user would see, if the element was inserted into the DOM.
// Minus the stuff that would depend on stylesheets anyway.
return div.innerText;
}
const html = `
Some random untrusted string that is supposed to contain html.
Presumably some 'rich text'.
A few <div> or <p>, a link or two, a bit of <strong> and some such.
In any case not a complete html document.
`;
const text = convertHtmlToText(html);
const p = document.createElement("p");
p.textContent = text;
document.body.append(p);
I think that this is safe/secure, because scripts are not executed as long as the div
used for conversion is not inserted into the document.
Question: Is this safe/secure?
Upvotes: 2
Views: 717
Reputation: 136707
No this is not safe at all.
function convertHtmlToText(html) {
const div = document.createElement("div");
// assumpton: because the div is not part of the document
// - no scripts are executed
// - no layout pass
div.innerHTML = html;
// assumption: whitespace is still normalized
// assumption: this returns the text a user would see, if the element was inserted into the DOM.
// Minus the stuff that would depend on stylesheets anyway.
return div.innerText;
}
const html = `<img onerror="alert('Gotcha!')" src="">Hi`;
const text = convertHtmlToText(html);
const p = document.createElement("p");
p.textContent = text;
document.body.append(p);
If you really can go only with the text content then prefer a DOMParser which will not execute any script:
function convertHtmlToText(html) {
const doc = new DOMParser().parseFromString(html, 'text/html');
return doc.body.innerText;
}
const html = `<img onerror="alert('Gotcha!')" src="">Hi`;
const text = convertHtmlToText(html);
const p = document.createElement("p");
p.textContent = text;
document.body.append(p);
But beware these methods will also catch the text content of nodes users can't normally see (e.g <style>
or <script>
).
Upvotes: 4