bagofmilk
bagofmilk

Reputation: 1550

Remove whole blacklisted words from a string

The problem with the code, below, is that it removes characters from the string rather than the words.

$str = "In a minute, remove all of the corks from these bottles in the cellar";

$useless_words = array("the", "of", "or", "in", "a");

$newstr = str_replace($useless_words, "", $str);

Output of above:

In mute, remove ll   cks from se bottles   cellr

I need the output to be:

minute, remove all corks from these bottles cellar

I'm assuming I can't use str_replace(). What can I do to achieve this?

Upvotes: 0

Views: 61

Answers (3)

mickmackusa
mickmackusa

Reputation: 48031

In additional to enforcing whole word matches with word boundaries, use a conditional expression to consume the leading or trailing whitespace characters (but not both!).

Code: (Demo)

$str = "The game start in a minute, remove all of the corks from these bottles in the cellar they're in";
$useless_words = ["the", "of", "or", "in", "a"];

var_dump(
    preg_replace('/(\s+)?\b(?:' . implode('|', $useless_words) . ')\b(?(1)|\s+)/i', '', $str)
);

Output:

string(69) "game start minute, remove all corks from these bottles cellar they're"

Notice that the leading The plus trailing space is removed, the final in and its leading space is removed, all of the corks became all corks which is separated by only one space.

Upvotes: 0

Toto
Toto

Reputation: 91518

preg_replace will do the job:

$str = "The game start in a minute, remove all of the corks from these bottles in the cellar";
$useless_words = array("the", "of", "or", "in", "a");
$pattern = '/\h+(?:' . implode($useless_words, '|') . ')\b/i';
$newstr = preg_replace($pattern, "", $str);
echo $newstr,"\n";

Output:

The game start minute, remove all corks from these bottles cellar

Explanation:

The pattern looks like : /\h+(?:the|of|or|in|a)\b/i

/                   : regex delimiter
  \h+               : 1 or more horizontal spaces
  (?:               : start non capture group
    the|of|or|in|a  : alternatives for all the useless words
  )                 : end group
  \b                : word boundary, make sure we don't have a word character before
/i                  : regex delimiter, case insensitive

Upvotes: 1

Mike M
Mike M

Reputation: 488

$useless_words = array(" the ", " of ", " or ", " in ", " a ");
$str = "In a minute, remove all of the corks from these bottles in the 
cellar";

$newstr = str_replace($useless_words, " ", $str);

$trimmed_useless_words = array_map('trim',$useless_words);
$newstr2 = '';
foreach ($trimmed_useless_words as &$value) {
   if (strcmp($value, substr($newstr,0,strlen($value)))){
       $newstr2 = substr($newstr, strlen($value) );
       break;
   }
}
if ($newstr2 == ''){
    $newstr2 = $newstr; 
}
echo $newstr2;

Upvotes: 1

Related Questions