user7432810
user7432810

Reputation: 695

preg replace would ignore non-letter characters when detecting words

I have an array of words and a string and want to add a hashtag to the words in the string that they have a match inside the array. I use this loop to find and replace the words:

foreach($testArray as $tag){
   $str = preg_replace("~\b".$tag."~i","#\$0",$str);
}

Problem: lets say I have the word "is" and "isolate" in my array. I will get ##isolate at the output. this means that the word "isolate" is found once for "is" and once for "isolate". And the pattern ignores the fact that "#isoldated" is not starting with "is" anymore and it starts with "#".

I bring an example BUT this is only an example and I don't want to just solve this one but every other possiblity:

$str = "this is isolated is an  example of this and that";
$testArray = array('is','isolated','somethingElse');

Output will be:

this #is ##isolated #is an  example of this and that

Upvotes: 1

Views: 47

Answers (2)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

A way to do that is to split your string by words and to build a associative array with your original array of words (to avoid the use of in_array):

$str = "this is isolated is an example of this and that";
$testArray = array('is','isolated','somethingElse');

$hash = array_flip(array_map('strtolower', $testArray));

$parts = preg_split('~\b~', $str);

for ($i=1; $i<count($parts); $i+=2) {
    $low = strtolower($parts[$i]);
    if (isset($hash[$low])) $parts[$i-1] .= '#';
}

$result = implode('', $parts);

echo $result;

This way, your string is processed only once, whatever the number of words in your array.

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626929

You may build a regex with an alternation group enclosed with word boundaries on both ends and replace all the matches in one pass:

$str = "this is isolated is an  example of this and that";
$testArray = array('is','isolated','somethingElse');
echo preg_replace('~\b(?:' . implode('|', $testArray) . ')\b~i', '#$0', $str);
// => this #is #isolated #is an  example of this and that

See the PHP demo.

The regex will look like

~\b(?:is|isolated|somethingElse)\b~

See its online demo.

If you want to make your approach work, you might add a negative lookbehind after \b: "~\b(?<!#)".$tag."~i","#\$0". The lookbehind will fail all matches that are preceded with #. See this PHP demo.

Upvotes: 1

Related Questions