Reputation: 119
I'm trying to write a highlight plugin, and would like to preserve HTML formatting. Is it possible to ignore all the characters between < and > in a string when doing a replace using javascript?
Using the following as an example:
var string = "Lorem ipsum dolor span sit amet, consectetuer <span class='dolor'>dolor</span> adipiscing elit.";
I would like to be able to achieve the following (replace 'dolor' with 'FOO'):
var string = "Lorem ipsum FOO span sit amet, consectetuer <span class='dolor'>FOO</span> adipiscing elit.";
Or perhaps even this (replace 'span' with 'BAR'):
var string = "Lorem ipsum dolor BAR sit amet, consectetuer <span class='dolor'>dolor</span> adipiscing elit.";
I came very close to finding an answer given by tambler here: Can you ignore HTML in a string while doing a Replace with jQuery? but, for some reason, I just can't get the accepted answer to work.
I'm completely new to regex, so any help would be gratefully appreciated.
Upvotes: 5
Views: 4096
Reputation: 1
Tim Down delivered a cool function. If you want the replace-text to contain HTML then simply use this small change. The regex has to contain "()" for $1 to work for example: let regex = new RegExp('(' + textToReplace + ')', 'gi');
const textReplacerFunc = function(textNode, regex) {
textNode.parentNode.innerHTML = textNode.data.replace(regex, '<span class="highlight">$1</span>');
};
Upvotes: 0
Reputation: 324547
Parsing the HTML using the browser's built-in parser via innerHTML
followed by DOM traversal is the sensible way to do this. Here's an answer loosely based on this answer:
Live demo: http://jsfiddle.net/FwGuq/1/
Code:
// Reusable generic function
function traverseElement(el, regex, textReplacerFunc) {
// script and style elements are left alone
if (!/^(script|style)$/.test(el.tagName)) {
var child = el.lastChild;
while (child) {
if (child.nodeType == 1) {
traverseElement(child, regex, textReplacerFunc);
} else if (child.nodeType == 3) {
textReplacerFunc(child, regex);
}
child = child.previousSibling;
}
}
}
// This function does the replacing for every matched piece of text
// and can be customized to do what you like
function textReplacerFunc(textNode, regex, text) {
textNode.data = textNode.data.replace(regex, "FOO");
}
// The main function
function replaceWords(html, words) {
var container = document.createElement("div");
container.innerHTML = html;
// Replace the words one at a time to ensure each one gets matched
for (var i = 0, len = words.length; i < len; ++i) {
traverseElement(container, new RegExp(words[i], "g"), textReplacerFunc);
}
return container.innerHTML;
}
var html = "Lorem ipsum dolor span sit amet, consectetuer <span class='dolor'>dolor</span> adipiscing elit.";
alert( replaceWords(html, ["dolor"]) );
Upvotes: 6
Reputation: 121712
This solution works with perl, and should also work with Javascript since it is compatible with ECMA 262:
s,\bdolor\b(?=[^"'][^>]*>),FOO,g
Basically, replace if the word is followed by everything which is not a quote, followed by everything which is not the closing >
and the closing >
itself.
Upvotes: 1