Fuxi
Fuxi

Reputation: 7589

Newbie regex question - detect spam

Here's my regex newbie questions:

Upvotes: 2

Views: 938

Answers (2)

user181548
user181548

Reputation:

How can I check if a string has 3 spam words? (for example: viagra,pills and shop)

A regex to spot any one of those three words might look like this (Perl):

if ($string =~ /(viagra|pills|shop)/) {
    # spam
}

If you want to spot all three, a regex alone isn't really enough:

my $bad_words = 0;
while ($string =~ /(viagra|pills|shop)/g) {
     $bad_words++;
}
if ($bad_words >= 3) {
     # spam
}

How can I detect also variations of those spam words like "v-iagra" or "v.iagra" ? (one additional character)

It's not so easy to do that with just a regex. You could try something like

 $string =~ s/\W//g;

to remove all non-word characters like . and -, and then check the string using the test above. This would strip spaces too though.

Upvotes: 2

Chris Tonkinson
Chris Tonkinson

Reputation: 14459

Regex doesn't seem like quite the right hammer for this particular nail. For your list, you can simply throw all of you blacklisted words in a sorted list of some kind, and scan each token against that list. Direct string operations are always faster than invoking the regular expression engine du jour.

For your variations ("v-iagra", et. al) I'd remove all non-characters (as @Kinopiko suggested) and then run them past your blacklist again. If you're wary of things like "viiagra", et cetera, I'd check out Aspell. It's a great library, and looks like CPAN has a Perl binding.

Upvotes: 3

Related Questions