André Cardoso
André Cardoso

Reputation: 115

Replace only whole word matches not preceded by a hash symbol

I'm sanitizing a string by removing strings found in this array:

$regex = array("subida", " de"," do", " da", "em", " na", " no", "blitz");

And this is the str_replace() that I'm using:

for ($i = 0;$i < 8; $i++){
    $twit = str_replace($regex[$i], '', $twit);
}

How do I make it only remove a word if it's exactly the word in string, I mean, I have the following phrase:

#blitz na subida do alfabarra blitz

it will return me:

# alfabarra

I don't want the first blitz to be removed because it is preceded by a hash (#), I want it to output:

#blitz alfabarra

Upvotes: 3

Views: 245

Answers (4)

mickmackusa
mickmackusa

Reputation: 47863

To sanitize your Portuguese phrase as desired using your set of words, each word will need to be programmatically prepared for the regex engine.

If the word starts with a space (an entirely insignificant word) then there is no need for a leafing word boundary, simply escape any special characters and append a word boundary.

If the word does not start with a space, then:

  • optionally match a leading space,
  • require a word boundary before the word, and
  • do not allow a preceding hash symbol.

Then escape special characters and append a word boundary.

Code: (Demo)

$subs = array_map(
    fn($v) => (str_starts_with($v, ' ') ? '' : ' ?\b(?<!#)')
        . preg_quote($v) . '\b',
    $subpatterns
);
        
echo preg_replace(
         '~' . implode('|', $subs) . '~u',
         '',
         $str
     );
// #blitz alfabarra

I have added the u pattern modifier in case multibyte characters come into play. Your sample text doesn't indicate the possibility of uppercase characters.

Upvotes: 0

Dawit
Dawit

Reputation: 11

Try this:

for($i=0; $i<$regex('count'); $i++){
    foreach($regex[$i] as $key) {
        if ( is_string($key) ) {
            $twit = str_replace($regex[$i],'', $twit);
        }
    }
}

Upvotes: 0

cmbuckley
cmbuckley

Reputation: 42458

After failing to come up with a catch-all regex solution, the following may be useful:

$words = array("subida", " de", " do", " da", "em", " na", " no", "blitz");
$words = array_map('trim', $words);

$str = '#blitz *blitz ablitz na subida do alfabarra blitz# blitz blitza';

$str_words = explode(' ', $str);
$str_words = array_diff($str_words, $words);
$str = implode(' ', $str_words);
var_dump($str);

Gets round a few complications with word boundaries in regex-based solutions.

Upvotes: 1

alex
alex

Reputation: 490143

This assumes that none of your strings have / in them. If so, run preg_quote() explicitly with / as the second argument.

It also assumes you want to match the words, so I trimmed each word.

$words = array("subida", " de"," do", " da", "em", " na", " no", "blitz");

$words = array_map('trim', $words);

$words = array_map('preg_quote', $words);

$str = preg_replace('/\b[^#](?:' . implode('|', $words) . ')\b/', '', $str);

Codepad.

Upvotes: 5

Related Questions