Tomasz
Tomasz

Reputation: 1368

PHP preg_match - catching full 'words' (even if they start with a special character)

I have a PHP function to check if a string contains specific (full) 'words' from an array (some of these 'words' may start with a special character followed by a space OR end with a space). The problem is with 'words' that start with special characters, for example: +, -, /, $, # etc. Why this 'contains' function doesn't catch such words? I added preg_quote to it and it still doesn't work.


$bads = array('+11'," - 68",'[img','$cool ', "# hash"); 
// disallowed full 'words';**some may start with a special character + space or end with a space**; if one of them appears in string, the function should return true

$s= 'This is +11 test to show if $cool or [img works but it does $cool not';
//another example to test: $s= 'This - 68 is # hash not';

if(contains($s,$bads)) {
echo 'Contains! ';
}

#### FUNCTION ###

function contains($str, $bads)
{
foreach($bads as $a) {
$a=preg_quote($a,'/');
if(preg_match("/\b".$a."\b/",$str)) return true;
}
return false;
}

Upvotes: 1

Views: 464

Answers (3)

bishop
bishop

Reputation: 39494

Intuition breaks down when applying word-boundary to a pattern that contains non-word characters. More on that here. What you seem to want, for this case, is \s:

function contains($str, $bads)
{
    $template = '/(\s+%1$s\s+|^\s*%1$s\s+|\s+%1$s\s*$|^\s*%1$s\s*$)/';
    foreach ($bads as $a) {
        $regex = sprintf($template, preg_quote($a, '/'));
        if (preg_match($regex, $str)) {
            return true;
        }
    }
    return false;
}

See it in action at 3v4l.org.

The regex checks for four different cases, each separated by |:

  1. One or more spaces, the bad pattern, then one or more spaces.
  2. Start of input, zero or more spaces, the bad pattern, then one or more spaces.
  3. One or more spaces, the bad pattern, zero or more spaces, then end of input.
  4. Start of input, zero or more spaces, the bad pattern, zero or more spaces, then end of input.

If you could guarantee that all of your bad patterns contained only word characters - [0-9A-Za-z_] - then \b would work just fine. Since that is not true here, you need to deploy a more explicit pattern.

Upvotes: 1

Andreas
Andreas

Reputation: 23968

This is the best I can do.

https://3v4l.org/C8KqP

So build an string with the regex and if it starts with $ do not add \b.
I guess this has to be modified to fit your code but you can see the concept.
Also since I only do one regex with all the words it's much more efficient than checking one word at the time.

$bads = array('+11','- 68','[img','$cool', '# hash'); // disallowed full 'words'; if one of them appears in string, the function should return true

$s= 'This is test to show if or $cool works but it does not';
//another example to test: $s= 'This - 68 is # hash not';

if(contains($s,$bads)) {
echo 'Contains! ';
}

#### FUNCTION ###

function contains($str, $bads)
{
    $b = "/";
    foreach($bads as $a) {
        if(substr($a,0,1) == "$"){
            $b .= preg_quote($a,'/'). "|";
        }else{
            $b .= "\b" . preg_quote($a,'/'). "\b|";
        }
    }
    $b = substr($b, 0,-1) ."/";
    if(preg_match($b,$str, $m)){
        return true;    
    } 

    return false;
}

Upvotes: 0

Nigel Ren
Nigel Ren

Reputation: 57141

There are a few changes...

<?php
error_reporting ( E_ALL );
ini_set ( 'display_errors', 1 );
$bads = array("+11","- 68","[img",'$cool', "# hash"); 
// disallowed full 'words'; if one of them appears in string, 
// the function should return true

$s= 'This is +11 test to show if $cool or [img works but it does $cool not';
$s= 'This - 68 is # hash not';

if(contains($s,$bads)) {
    echo 'Contains! ';
}

#### FUNCTION ###

function contains($str, $bads)
{
    foreach($bads as $a) {
        $a=preg_quote($a,'\\');
        if(preg_match("/$a/",$str)) return true;
    }
    return false;
}

I've used single quotes round the $cool value and changed the preg_quote to use \ instead of /. Also removed the \b's from the preg_match - as some options are effectively multiple words.

Upvotes: 0

Related Questions