Saadi
Saadi

Reputation: 1294

How to Highlight Text in HTML While Ignoring Tags Without Breaking Valid HTML?

I'm trying to highlight text within an HTML string while ignoring any HTML tags inside the text. However, my current approach sometimes wraps only part of the text and breaks the HTML structure.

What I'm Trying to Achieve

I want to highlight a reference text even if it's split across HTML tags without breaking the HTML structure.

Example Input & Expected Output

Input:
let originalHtml = "This is a <b>sample</b> text with <i>some</i> formatting.";
let referenceText = "sample text";
let ideaId = 123;
let highlightTag = `highlight_${ideaId}`;
Expected Output (Valid HTML):
This is a <highlight_123><b>sample</b> text</highlight_123> with <i>some</i> formatting.

The highlight tag should wrap the entire matched phrase, preserving HTML tags inside it.


What I Have Tried

I wrote this function to attempt the replacement:

function highlightTextIgnoringTags(originalHtml, referenceText, highlightTag) {
    let text = referenceText.replace(/[-/\^$*+?.()|[\]{}]/g, "\\$&"); // Escape special regex characters
    let textWithTags = text.replace(/\s/g, "(?:\\s|<[^>]+>)*"); // Allow spaces and tags
    let regex = new RegExp(`(${textWithTags})`, "gi"); // Case-insensitive regex
    
    return originalHtml.replace(regex, (match) => highlightTag.replace('$0', match));
}

However, the output I get is incorrect:

This is a <b><highlight_123>sample</b> text</highlight_123> with <i>some</i> formatting.

This breaks the HTML because the <b> tag is opened before <highlight_123> but closed outside of it.

How can I modify my function to correctly wrap the entire matched phrase in a highlight tag while ensuring valid HTML? Is there a better approach using DOM parsing or another method?

Upvotes: 0

Views: 83

Answers (0)

Related Questions