Reputation: 912
I need a little help with Regular Expressions.
I'm using Javascript and JQuery to hyperlink terms within an HTML document, to do this I'm using the following code. I'm doing this for a number of terms in a massive document.
var searchterm = "Water";
jQuery('#content p').each(function() {
var content = jQuery(this),
txt = content.html(),
found = content.find(searchterm).length,
regex = new RegExp('(' + searchterm + ')(?![^(<a.*?>).]*?<\/a>)','gi');
if (found != -1) {
//hyperlink the search term
txt = txt.replace(regex, '<a href="/somelink">$1</a>');
content.html(txt);
}
});
There are however a number of instances I do not want to match and due to time constraints and brain melt, I'm reaching out for some assistance.
EDIT: I've updated the codepen below based on the excellent example provided by @ggorlen, thank you!
Example https://codepen.io/julian-young/pen/KKwyZMr
Upvotes: 1
Views: 196
Reputation: 56875
Dumping the entire DOM to raw text and parsing it with regex circumvents the primary purpose of jQuery (and JS, by extension), which is to traverse and manipulate the DOM as an abstract tree of nodes.
Text nodes have a nodeType
Node.TEXT_NODE
which we can use in a traversal to identify the non-link nodes you're interested in.
After obtaining a text node, regex can be applied appropriately (parsing text, not HTML). I used <mark>
for demonstration purposes, but you can make this an anchor tag or whatever you need.
jQuery gives you a replaceWith
method that replaces the content of a node after you've made the desired regex substitution.
$('#content li').contents().each(function () {
if (this.nodeType === Node.TEXT_NODE) {
var pattern = /(\b[Ww]aters?(?!-)\b)/g;
var replacement = '<mark>$1</mark>';
$(this).replaceWith(this.nodeValue.replace(pattern, replacement));
}
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<h1>Example Content</h1>
<div id="content">
<ul>
<li>Water is a fascinating subject. - <strong>match</strong></li>
<li>We all love water. - <strong>match</strong></li>
<li>ice; water; steam - <strong>match</strong></li>
<li>The beautiful waters of the world - <strong>match</strong> (including the s)</li>
<li>and all other water-related subjects - <strong>no match</strong></li>
<li>and this watery topic of - <strong>no match</strong></li>
<li>of WaterStewardship looks at how best - <strong>no match</strong></li>
<li>On the topic of <a href="/governance">water governance</a> - <strong>no match</strong></li>
<li>and other <a href="/water">water</a> related things - <strong>no match</strong></li>
<li>the best of <a href="/allthingswater">all things water</a> - <strong>no match</strong></li>
</ul>
</div>
You can do it without jQ and apply to everything in the document:
for (const parent of document.querySelectorAll("body *:not(a)")) {
for (const child of parent.childNodes) {
if (child.nodeType === Node.TEXT_NODE) {
const pattern = /(\b[Ww]aters?(?!-)\b)/g;
const replacement = "<mark>$1</mark>";
const subNode = document.createElement("span");
subNode.innerHTML = child.textContent.replace(pattern, replacement);
parent.insertBefore(subNode, child);
parent.removeChild(child);
}
}
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<div>
hello water
<div>
<div>
I love Water.
<a href="">more water</a>
</div>
watership down
<h4>watery water</h4>
<p>
waters
</p>
foobar <a href="">water</a> water
</div>
</div>
Upvotes: 2