Reputation: 1294
I'm trying to highlight text within an HTML string while ignoring any HTML tags inside the text. However, my current approach sometimes wraps only part of the text and breaks the HTML structure.
I want to highlight a reference text even if it's split across HTML tags without breaking the HTML structure.
let originalHtml = "This is a <b>sample</b> text with <i>some</i> formatting.";
let referenceText = "sample text";
let ideaId = 123;
let highlightTag = `highlight_${ideaId}`;
This is a <highlight_123><b>sample</b> text</highlight_123> with <i>some</i> formatting.
The highlight tag should wrap the entire matched phrase, preserving HTML tags inside it.
I wrote this function to attempt the replacement:
function highlightTextIgnoringTags(originalHtml, referenceText, highlightTag) {
let text = referenceText.replace(/[-/\^$*+?.()|[\]{}]/g, "\\$&"); // Escape special regex characters
let textWithTags = text.replace(/\s/g, "(?:\\s|<[^>]+>)*"); // Allow spaces and tags
let regex = new RegExp(`(${textWithTags})`, "gi"); // Case-insensitive regex
return originalHtml.replace(regex, (match) => highlightTag.replace('$0', match));
}
However, the output I get is incorrect:
This is a <b><highlight_123>sample</b> text</highlight_123> with <i>some</i> formatting.
This breaks the HTML because the <b>
tag is opened before <highlight_123>
but closed outside of it.
How can I modify my function to correctly wrap the entire matched phrase in a highlight tag while ensuring valid HTML? Is there a better approach using DOM parsing or another method?
Upvotes: 0
Views: 83