Reputation: 27058
i have some simple code that does a preg match:
$bad_words = array('dic', 'tit', 'fuc',); //for this example i replaced the bad words
for($i = 0; $i < sizeof($bad_words); $i++)
{
if(preg_match("/$bad_words[$i]/", $str, $matches))
{
$rep = str_pad('', strlen($bad_words[$i]), '*');
$str = str_replace($bad_words[$i], $rep, $str);
}
}
echo $str;
So, if $str
was "dic"
the result will be '*' and so on.
Now there is a small problem if $str == f.u.c
. The solution might be to use:
$pattern = '~f(.*)u(.*)c(.*)~i';
$replacement = '***';
$foo = preg_replace($pattern, $replacement, $str);
In this case i will get ***
, in any case. My issue is putting all this code together.
I've tried:
$pattern = '~f(.*)u(.*)c(.*)~i';
$replacement = 'fuc';
$fuc = preg_replace($pattern, $replacement, $str);
$bad_words = array('dic', 'tit', $fuc,);
for($i = 0; $i < sizeof($bad_words); $i++)
{
if(preg_match("/$bad_words[$i]/", $str, $matches))
{
$rep = str_pad('', strlen($bad_words[$i]), '*');
$str = str_replace($bad_words[$i], $rep, $str);
}
}
echo $str;
The idea is that $fuc
becomes fuc
then I place it in the array then the array does its jobs, but this doesn't seem to work.
Upvotes: 0
Views: 549
Reputation: 59709
First of all, you can do all of the bad word replacements with one (dynamically generated) regex, like this:
$bad_words = array('dic', 'tit', 'fuc',);
$str = preg_replace_callback("/\b(?:" . implode( '|', $bad_words) . ")\b/",
function( $match) {
return str_repeat( '*', strlen( $match[0]));
}, $str);
Now, you have the problem of people adding periods in between the word, which you can search for with another regex and replace them as well. However, you must keep in mind that .
matches any character in a regex, and must be escaped (with preg_quote()
or a backslash).
$bad_words = array_map( function( $el) {
return implode( '\.', str_split( $el));
}, $bad_words);
This will create a $bad_words
array similar to:
array(
'd\.i\.c',
't\.i\.t',
'f\.u\.c'
)
Now, you can use this new $bad_words
array just like the above one to replace these obfuscated ones.
Hint: You can make this array_map()
call "better" in the sense that it can be smarter to catch more obfuscations. For example, if you wanted to catch a bad word separated with either a period or a whitespace character or a comma, you can do:
$bad_words = array_map( function( $el) {
return implode( '(?:\.|\s|,)', str_split( $el));
}, $bad_words);
Now if you make that obfuscation group optional, you'll catch a lot more bad words:
$bad_words = array_map( function( $el) {
return implode( '(?:\.|\s|,)?', str_split( $el));
}, $bad_words);
Now, bad words should match:
f.u.c
f,u.c
f u c
fu c
f.uc
And many more.
Upvotes: 3