Reputation: 1368
I have a PHP function to check if a string contains specific (full) 'words' from an array (some of these 'words' may start with a special character followed by a space OR end with a space). The problem is with 'words' that start with special characters, for example: +, -, /, $, # etc. Why this 'contains' function doesn't catch such words? I added preg_quote to it and it still doesn't work.
$bads = array('+11'," - 68",'[img','$cool ', "# hash");
// disallowed full 'words';**some may start with a special character + space or end with a space**; if one of them appears in string, the function should return true
$s= 'This is +11 test to show if $cool or [img works but it does $cool not';
//another example to test: $s= 'This - 68 is # hash not';
if(contains($s,$bads)) {
echo 'Contains! ';
}
#### FUNCTION ###
function contains($str, $bads)
{
foreach($bads as $a) {
$a=preg_quote($a,'/');
if(preg_match("/\b".$a."\b/",$str)) return true;
}
return false;
}
Upvotes: 1
Views: 464
Reputation: 39494
Intuition breaks down when applying word-boundary to a pattern that contains non-word characters. More on that here. What you seem to want, for this case, is \s
:
function contains($str, $bads)
{
$template = '/(\s+%1$s\s+|^\s*%1$s\s+|\s+%1$s\s*$|^\s*%1$s\s*$)/';
foreach ($bads as $a) {
$regex = sprintf($template, preg_quote($a, '/'));
if (preg_match($regex, $str)) {
return true;
}
}
return false;
}
The regex checks for four different cases, each separated by |
:
If you could guarantee that all of your bad patterns contained only word characters - [0-9A-Za-z_]
- then \b
would work just fine. Since that is not true here, you need to deploy a more explicit pattern.
Upvotes: 1
Reputation: 23968
This is the best I can do.
So build an string with the regex and if it starts with $ do not add \b.
I guess this has to be modified to fit your code but you can see the concept.
Also since I only do one regex with all the words it's much more efficient than checking one word at the time.
$bads = array('+11','- 68','[img','$cool', '# hash'); // disallowed full 'words'; if one of them appears in string, the function should return true
$s= 'This is test to show if or $cool works but it does not';
//another example to test: $s= 'This - 68 is # hash not';
if(contains($s,$bads)) {
echo 'Contains! ';
}
#### FUNCTION ###
function contains($str, $bads)
{
$b = "/";
foreach($bads as $a) {
if(substr($a,0,1) == "$"){
$b .= preg_quote($a,'/'). "|";
}else{
$b .= "\b" . preg_quote($a,'/'). "\b|";
}
}
$b = substr($b, 0,-1) ."/";
if(preg_match($b,$str, $m)){
return true;
}
return false;
}
Upvotes: 0
Reputation: 57141
There are a few changes...
<?php
error_reporting ( E_ALL );
ini_set ( 'display_errors', 1 );
$bads = array("+11","- 68","[img",'$cool', "# hash");
// disallowed full 'words'; if one of them appears in string,
// the function should return true
$s= 'This is +11 test to show if $cool or [img works but it does $cool not';
$s= 'This - 68 is # hash not';
if(contains($s,$bads)) {
echo 'Contains! ';
}
#### FUNCTION ###
function contains($str, $bads)
{
foreach($bads as $a) {
$a=preg_quote($a,'\\');
if(preg_match("/$a/",$str)) return true;
}
return false;
}
I've used single quotes round the $cool value and changed the preg_quote to use \ instead of /. Also removed the \b's from the preg_match - as some options are effectively multiple words.
Upvotes: 0