LouieV
LouieV

Reputation: 1052

Javascript dynamic regex

After looking here I came up with a patter to test an array of words against a string.

$.each(data, function(index, val) {
    var pattern = new RegExp('?:^|\s'+ val + '?=\s|$', 'g');
    console.log(pattern.test(comment));
    if (!pattern.test(comment)) {                           
           yay = true;
         }
});

the problem here is that it returns true all the time. Any suggestions? Thanks!

Upvotes: 0

Views: 3550

Answers (2)

qJake
qJake

Reputation: 17139

Fix this line:

var pattern = new RegExp('(?:^|\s)'+ val + '(?=\s|$)', 'g');

You may also find it useful to debug/validate your regex using this online utility:

http://gskinner.com/RegExr/

You will need to replace your val variable with a sample value instead in order to debug.

Upvotes: 0

FrankieTheKneeMan
FrankieTheKneeMan

Reputation: 6800

From your JsFiddle, I forked and created one of my own, and your solution (with my regular expression from the comments) works quite well, once all the minor typos are cleared up. However, it could be much, much cleaner and faster. Here's what I did differently:

$('#send-btn').on('click',function(){
    $('#error').hide();
    var pattern = new RegExp('\\b(' + list.join('|') + ')\\b', 'i');
    var comment = $('#comment').val();
    if(pattern.test(comment)){
        $('#error').show();
    };
});

Specifically, the pattern I generated takes advantage of Javascript's Array.join (javascript built-in) which pastes an Array of strings together with a prescribed interstitial string. This builds a string with all of your search words appended by the regular expressions alternator (|). Then by surrounding that group with parentheses to contain the alternation, I can apply the word boundary regular expression(\b) to either end to make sure we're matching only entire words. In other news: You really don't need the g (global) modifier if you're just doing a simple test. You may need it in other applications - such as if you wanted to highlight the offending word - but for this I dropped it. You SHOULD be using the i modifier for case-insensitive behaviour.

The biggest upside to this is that you could, if you wanted to, choose to define your regular expression outside this function, and you'll see pretty significant speed gains.

Downside: There are diminishing returns as your list of foul words gets longer. But given this benchmark, it'll be a while before your way is better (a long while).

NOTE

You should be made aware that you ought to escape your words before you use them in a regular expression - In your list, for instance 'a.s.s' will match 'alsls'. While that is gibberish, it's not really a swear word, and you can easily see how such a problem could extrapolate into finding profanity where there is none. However, you may choose to do this outside the function, perhaps even leveraging the power of regular expressions in your word definitions (define '[a@][$s]{2}' instead of 'ass', '@ss', 'a$s', 'as$', '@$s', '@s$', 'a$$', and '@$$'), so I'm not going to address that here.

Good luck, and happy regexing.

Upvotes: 1

Related Questions