Reputation: 261
I'm working on a WordPress plugin that replaces the bad words from the comments with random new ones from a list.
I now have 2 arrays: one containing the bad words and another containing the good words.
$bad = array("bad", "words", "here");
$good = array("good", "words", "here");
Since I'm a beginner, I got stuck at some point.
In order to replace the bad words, I've been using $newstring = str_replace($bad, $good, $string);
.
My first problem is that I want to turn off the case sensivity, so I won't put the words like this "bad", "Bad", "BAD", "bAd", "BAd", etc
but I need the new word to keep the format of the original word, for example if I write "Bad", it would be replaced with "Words", but if I type "bad", it would be replaced with "words", etc.
My first tought was to use str_ireplace
, but it forgets if the original word had a capital letter.
The second problem is that I don't know how to deal with the users that type like this: "b a d", "w o r d s", etc. I need an idea.
In order to make it select a random word, I think I can use $new = $good[rand(0, count($good)-1)];
then $newstring = str_replace($bad, $new, $string);
. If you have a better idea, I'm here to listen.
The general look of my script:
function noswear($string)
{
if ($string)
{
$bad = array("bad", "words");
$good = array("good", "words");
$newstring = str_replace($bad, $good, $string);
return $newstring;
}
echo noswear("I see bad words coming!");
Thank you in advance for your help!
Upvotes: 9
Views: 8139
Reputation: 6148
There are (as has been pointed out in the comments numerous times) gaping holes for you - and/or your code - to fall into through implementing such a feature, to name but a few:
You'd do better to implement a moderation/flagging system where people can flag offensive comments which can then be edited/removed by mods, users, etc.
On that understanding, let us proceed...
Given that you:
$bad_words
$good_words
You can very easily use PHP
s preg_replace_callback
function:
$input_string = 'This Could be interesting but should it be? Perhaps this \'would\' work; or couldn\'t it?';
$bad_words = array('could', 'would', 'should');
$good_words = array('might', 'will');
function replace_words($matches){
global $good_words;
return $matches[1].$good_words[rand(0, count($good_words)-1)].$matches[3];
}
echo preg_replace_callback('/(^|\b|\s)('.implode('|', $bad_words).')(\b|\s|$)/i', 'replace_words', $input_string);
Okay, so what the preg_replace_callback
does is it compiles a regex pattern consisting of all of the bad words. Matches will then be in the format:
/(START OR WORD_BOUNDARY OR WHITE_SPACE)(BAD_WORD)(WORD_BOUNDARY OR WHITE_SPACE OR END)/i
The i
modifier makes it case insensitive so both bad
and Bad
would match.
The function replace_words
then takes the matched word and it's boundaries (either blank or a white space character) and replaces it with the boundaries and a random good word.
global $good_words; <-- Makes the $good_words variable accessible from within the function
$matches[1] <-- The word boundary before the matched word
$matches[3] <-- The word boundary after the matched word
$good_words[rand(0, count($good_words)-1] <-- Selects a random good word from $good_words
You could rewrite the above as a one liner using an anonymous function in the preg_replace_callback
echo preg_replace_callback(
'/(^|\b|\s)('.implode('|', $bad_words).')(\b|\s|$)/i',
function ($matches) use ($good_words){
return $matches[1].$good_words[rand(0, count($good_words)-1)].$matches[3];
},
$input_string
);
If you're going to use it multiple times you may also write it as a self-contained function, although in this case you're most likely going to want to feed the good/bad words in to the function when calling it (or hard code them in there permanently) but that depends on how you derive them...
function clean_string($input_string, $bad_words, $good_words){
return preg_replace_callback(
'/(^|\b|\s)('.implode('|', $bad_words).')(\b|\s|$)/i',
function ($matches) use ($good_words){
return $matches[1].$good_words[rand(0, count($good_words)-1)].$matches[3];
},
$input_string
);
}
echo clean_string($input_string, $bad_words, $good_words);
Running the above functions consecutively with the input and word lists shown in the first example:
This will be interesting but might it be? Perhaps this 'will' work; or couldn't it?
This might be interesting but might it be? Perhaps this 'might' work; or couldn't it?
This might be interesting but will it be? Perhaps this 'will' work; or couldn't it?
Of course the replacement words are chosen randomly so if I refreshed the page I'd get something else... But this shows what does/doesn't get replaced.
$bad_words
foreach($bad_words as $key=>$word){
$bad_words[$key] = preg_quote($word);
}
\b
In this code I've used \b
, \s
, and ^
or $
as word boundaries there is a good reason for this. While white space
, start of string
, and end of string
are all considered word boundaries \b
will not match in all cases, for example:
\b\$h1t\b <---Will not match
This is because \b
matches against non-word characters (i.e. [^a-zA-Z0-9]
) and characters like $
don't count as word characters.
Depending on the size of your word list there are a couple of potential hiccups. From a system design perspective it's generally bad form to have huge regexes for a couple of reasons:
Given that the regex pattern is compiled by PHP
the first reason is negated. The second should be negated as well; if you're word list is large with a dozen permutations of each bad word then I suggest you stop and rethink your approach (read: use a flagging/moderation system).
To clarify, I don't see a problem have a small word list to filter out specific expletives as it serves a purpose: to stop users from having an outburst at one another; the problem comes when you try to filter out too much including permutations. Stick to filtering common swear words and if that doesn't work then - for the last time - implement a flagging/moderation system.
Upvotes: 11
Reputation: 13404
I came up to this method and it's working fine. Returning true, in case there is an entry of bad words in the entry.
Example:
function badWordsFilter($inputWord) {
$badWords = Array("bad","words","here");
for($i=0;$i<count($badWords);$i++) {
if($badWords[$i] == strtolower($inputWord))
return true;
}
return false;
}
Usage:
if (badWordsFilter("bad")) {
echo "Bad word was found";
} else {
echo "No bad words detected";
}
As the word 'bad' is blacklisted it will echo.
EDIT 1:
As offered by rid it's also possible to do simple in_array
check:
function badWordsFilter($inputWord) {
$badWords = Array("bad","words","here");
if(in_array(strtolower($inputWord), $badWords) ) {
return true;
}
return false;
}
EDIT 2:
As I promised, I came up to the slightly different idea of replacing bad words with good words, as you mentioned in your question. I hope it will help you a bit but this is the best I can offer at the moment, as I'm totally not sure on what you're trying to do.
Example:
1. Let's combine an array with bad and good words into one
$wordsTransform = array(
'shit' => 'ship'
);
2. Your imaginary user input
$string = "Rolling In The Deep by Adel\n
\n
There's a fire starting in my heart\n
Reaching a fever pitch, and it's bringing me out the dark\n
Finally I can see you crystal clear\n
Go ahead and sell me out and I'll lay your shit bare";
3. Replacing bad words with good words
$string = strtr($string, $wordsTransform);
4. Getting the desired output
Rolling In The Deep
There's a fire starting in my heart
Reaching a fever pitch, and it's bringing me out the dark
Finally I can see you crystal clear
Go ahead and sell me out and I'll lay your ship bare
EDIT 3:
To follow the correct comment from Wrikken, I have totally forgotten about that strtr
is case sensitive and that it's better to follow word-boundary. I have borrowed the following example from
PHP: strtr - Manual and modified it slightly.
Same idea as in my second edit but not register dependent, it checks for word boundaries and puts a backslash in front of every character that is part of the regular expression syntax:
1. Method:
//
// Written by Patrick Rauchfuss
class String
{
public static function stritr(&$string, $from, $to = NULL)
{
if(is_string($from))
$string = preg_replace("/\b{$from}\b/i", $to, $string);
else if(is_array($from))
{
foreach ($from as $key => $val)
self::stritr($string, $key, $val);
}
return preg_quote($string); // return and add a backslash to special characters
}
}
2. An array with bad and good words
$wordsTransform = array(
'shit' => 'ship'
);
3. Replacement
String::stritr($string, $wordsTransform);
Upvotes: 5