Reputation: 186
I'm creating a Chrome extension that strips the punctuation from a page, but my code also affects all the HTML tags of the page as well. The ID, the style, and even the SVG paths are affected by the punctuation change.
function removePunctuation(text) {
text = text.replace(/[\.#!£$%\^&\*;:{}=\-_`~()@\+\?\[\]\+]/g, ' ').replace(/ +/g, ' ')
return text
}
let textTags = document.querySelectorAll('p, h1, h2, h3, h4, h5, h6, li, td, caption, span, a, div');
for (let i = 0, l = textTags.length; i < l; i++) {
textTags[i].innerHTML = removePunctuation(textTags[i].innerHTML)
}
document.querySelectorAll
.String.prototype.replace()
methodI unsuccessfully tried to save the position of each tag, take it out and add it back in once the punctuation is removed. It always excludes the parameters or misses out on some tags.
I also tried to find a regex that would exclude HTML tags from the removal of punctuation but looking at regex as a whole, I'm not sure if that's even possible!
Upvotes: 1
Views: 230
Reputation: 195992
You need to find the textNodes in the document. Then update their nodeValue
.
Taking some code from How to get the text node of an element? to extract all textNodes and you can then apply your code to only these
const deepNonEmptyTextNodes = el => [...el.childNodes].flatMap(e =>
e.nodeType === Node.TEXT_NODE && e.textContent.trim() ?
e : deepNonEmptyTextNodes(e)
);
function removePunctuation(text) {
return text.replace(/[\.#!£$%\^&\*;:{}=\-_`~()@\+\?\[\]\+]/g, ' ').replace(/ +/g, ' ');
}
let textTags = [...document.querySelectorAll('p, h1, h2, h3, h4, h5, h6, li, td, caption, span, a, div')];
textTags.forEach(tagNode => {
const textNodes = deepNonEmptyTextNodes(tagNode);
textNodes.forEach(node => node.nodeValue = removePunctuation(node.nodeValue))
})
<p>clean up the following !£$%^& chars</p>
<strong>stay as $%^&* you are</strong>
<div>clean these !£$%^&[]@ also</div>
Keep in mind, though, that this code will modify text nodes inside tags you do not want, if they are nested inside tags you do want. So, if a strong
tag is inside a div
it will be cleaned up too.
In the same answer i linked to, you will find other methods that only return the immediate child text nodes of an element, if that is what you want.
Upvotes: 3
Reputation: 1
Can't you just focus the regular text tag?
Like just look in the <p>
, <h1>
, <h2>
…, <span>
, <em>
, <strong>
, <textarea>
, etc.
Upvotes: 0