emberfiend
emberfiend

Reputation: 65

Whole-word matching with jQuery and contains()

I'm writing a Greasemonkey script to selectively hide elements containing nasty stuff (a personal web sanitizer, if you will).

Here's what I've got so far:

//custom contains function which is case-insensitive
$.extend($.expr[":"], {
  "containsNC": function(elem, i, match, array) {
    return (elem.textContent || elem.innerText || "").toLowerCase().indexOf((match[3] || "").toLowerCase()) >= 0;
  }
});

//build array of words to filter
var nope = "long list of horrible words".toLowerCase().split(' ');

//start with an empty jQuery object
var nopeEles = $();

//add elements to filter to it
for (var i = 0; i < nope.length; i++) {
  nopeEles = nopeEles.add( $("a:containsNC('" + nope[i] + "')") );
  nopeEles = nopeEles.add( $("p:containsNC('" + nope[i] + "')") );
}

//hide all applicable elements
nopeEles.css("background-color", "white");
nopeEles.css("color", "white");

It works decently, but it does partial word matching, which makes short words not work. I want to filter elements containing words like "die" and "gun", without filtering those with words like "candied" or "gung-ho".

To be clear, I'm after whole-word, not exact-text. I want "gun" in the list to match not just "gun" but also "he fired a gun" and "a gun was fired". And not "gunney sergeant".

Every other answer I've seen on this topic recommends jQuery's filter(). I think I don't understand it well enough. I tried using this line in the loop, but nothing:

nopeEles = nopeEles.add( $("a").filter(function() { return $(this).text() === nope[i]; }) );

The other angle I thought to look at was fiddling with containsNC so it looks for the word, but with whitespace or end-of-string on either side. I don't really get how containsNC works, though.

Any pointers would be hugely appreciated!

Upvotes: 1

Views: 1408

Answers (1)

Brock Adams
Brock Adams

Reputation: 93473

That containsNC is just a subpar version of this p:containsCI() jQuery extension.
("NC" == "no case" ≈≈ "CI" == "Case insensitive".)

Use the linked jQuery extension instead and then you can use regex to match whole words like:

nopeEles = nopeEles.add( $("a:containsCI('\\b" + nope[i] + "\\b')") );

However, that question code is rather inefficient and you'll find that it slows the page because it scans the whole page 2N times (where N is the number of terms) multiplied by J substring scans (where J is the number of <a> and <p> nodes).

A more performant way is to scan each node only once by merging the regex. See this demo:

jQuery.extend (
    jQuery.expr[':'].containsCI = function (a, i, m) {
        var sText   = (a.textContent || a.innerText || "");
        var zRegExp = new RegExp (m[3], 'i');
        return zRegExp.test (sText);
    }
);

//-- Build array of terms to filter:
var badTerms    = ['die', 'guns?', 'agitators?'];
//-- Build ONE regex string for speed and efficiency:
var cnsrRegEx   = `\\b(${badTerms.join ("|")})\\b`;  //  \b is word-break regex.

var nopeEles    = $("a, p").filter (":containsCI('" + cnsrRegEx + "')");

//-- Hide all applicable elements:
nopeEles.css ( {
    "background-color": "white",
    "color": "white"
} );
a, p {border: 1px solid lightgray; padding: 0.3ex 1ex;}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.0/jquery.min.js"></script>
<p>All good</p>
<p>All bad agitators</p>
<div>Some bad: <a>die</a> <a>gun</a> <a>candied</a> <a>gung-ho</a> <a>guns</a>
  <a>he fired a gun</a> <a>gunney sergeant</a>
</div>

Note:

  1. Regex like guns? allows matching of both "gun" and "guns".
  2. Since we are building a string that will be converted to regex, \ characters must be escaped. That is use "\\b" to get \b in regex.

Upvotes: 2

Related Questions