Nate
Nate

Reputation: 28334

How to require full (not partial) match with regex for a bad word filter?

I'm writing a very basic commenting system and want to implement a simple, efficient bad words filter.

I'm aware of the problems associated with bad word filters and realize it's basically impossible to write one that keeps misspellings and innuendo out, but I'm just wanting to write a very simple one that keeps correct spellings of vulgar words from being displayed.

I found a bad words list of about 400 words and put it into preg_replace() with the pattern being:

/(these|are|bad|words|like|ass)/

The problem is that it replaces any occurrence of the characters in the pattern, even if they are in the middle of a word. So, for example, assist will be replaced with ist.

Second question: instead of replacing the bad words with an empty string, or with a fixed-width string such as ****, is there a way to replace it with a string of asterisks with the same length of the replaced word?

Upvotes: 1

Views: 3813

Answers (3)

Mike H-R
Mike H-R

Reputation: 7815

First off, one thing you want is word_boundary characters \b they are zero width and match the boundary of a word so make your regex:

/\b(these|are|bad|words|like|ass)\b/

secondly, to replace the string with another one of equal length just use a function that operates on the match.

Upvotes: 1

deceze
deceze

Reputation: 521995

preg_replace_callback(
    '/\b(these|are|bad|words|like|ass)\b/',
    function (array $match) { return str_repeat('*', strlen($match[1])); },
    $comment
)

\b is a word boundary and will probably suffice for most cases; though it probably won't be perfect for all cases.

Upvotes: 6

jeroen
jeroen

Reputation: 91734

You could use word boundaries:

/\b(these|are|bad|words|like|ass)\b/

Upvotes: 3

Related Questions