Patrioticcow
Patrioticcow

Reputation: 27058

how to do a preg_replace on a string in php?

i have some simple code that does a preg match:

$bad_words = array('dic', 'tit', 'fuc',); //for this example i replaced the bad words

for($i = 0; $i < sizeof($bad_words); $i++)
{
    if(preg_match("/$bad_words[$i]/", $str, $matches))
    {
        $rep = str_pad('', strlen($bad_words[$i]), '*');
        $str = str_replace($bad_words[$i], $rep, $str);
    }
}
echo $str;

So, if $str was "dic" the result will be '*' and so on.

Now there is a small problem if $str == f.u.c. The solution might be to use:

$pattern = '~f(.*)u(.*)c(.*)~i';
$replacement = '***';
$foo =  preg_replace($pattern, $replacement, $str);

In this case i will get ***, in any case. My issue is putting all this code together.

I've tried:

$pattern = '~f(.*)u(.*)c(.*)~i';
$replacement = 'fuc';
$fuc =  preg_replace($pattern, $replacement, $str);

$bad_words = array('dic', 'tit', $fuc,); 

for($i = 0; $i < sizeof($bad_words); $i++)
{
    if(preg_match("/$bad_words[$i]/", $str, $matches))
    {
        $rep = str_pad('', strlen($bad_words[$i]), '*');
            $str = str_replace($bad_words[$i], $rep, $str);
    }
}
echo $str;

The idea is that $fuc becomes fuc then I place it in the array then the array does its jobs, but this doesn't seem to work.

Upvotes: 0

Views: 549

Answers (1)

nickb
nickb

Reputation: 59709

First of all, you can do all of the bad word replacements with one (dynamically generated) regex, like this:

$bad_words = array('dic', 'tit', 'fuc',);

$str = preg_replace_callback("/\b(?:" . implode( '|', $bad_words) . ")\b/", 
    function( $match) {
        return str_repeat( '*', strlen( $match[0])); 
}, $str);

Now, you have the problem of people adding periods in between the word, which you can search for with another regex and replace them as well. However, you must keep in mind that . matches any character in a regex, and must be escaped (with preg_quote() or a backslash).

$bad_words = array_map( function( $el) { 
    return implode( '\.', str_split( $el));
}, $bad_words);

This will create a $bad_words array similar to:

array(
    'd\.i\.c',
    't\.i\.t',
    'f\.u\.c'
)

Now, you can use this new $bad_words array just like the above one to replace these obfuscated ones.

Hint: You can make this array_map() call "better" in the sense that it can be smarter to catch more obfuscations. For example, if you wanted to catch a bad word separated with either a period or a whitespace character or a comma, you can do:

$bad_words = array_map( function( $el) { 
    return implode( '(?:\.|\s|,)', str_split( $el));
}, $bad_words);

Now if you make that obfuscation group optional, you'll catch a lot more bad words:

$bad_words = array_map( function( $el) { 
    return implode( '(?:\.|\s|,)?', str_split( $el));
}, $bad_words);

Now, bad words should match:

f.u.c
f,u.c
f u c 
fu c
f.uc

And many more.

Upvotes: 3

Related Questions