neil
neil

Reputation: 387

How to remove (most) short words from a string

I'm currently using the following regex to remove small words ( < 4 chars) from a string.

$dirty = "I welcome you to San Diego";
$clean = preg_replace("/\b[^\s]{1,3}\b/", "", $dirty);

So, this would result in "Welcome Diego";

However, i now need to ignore certain words from being replaced, for instance:

$ignore = array("San", "you");

would result in "welcome you San Diego"

Upvotes: 3

Views: 2424

Answers (2)

webbiedave
webbiedave

Reputation: 48887

I recommend using a callback (preg_replace_callback) as it allows a more maintainable solution if you have to scale to a large number of words:

echo preg_replace_callback(
    '/\b[^\s]{1,3}\b/',
    create_function(
        '$matches',
        '$ignore = array("San", "you");
         if (in_array($matches[0], $ignore)) {
            return $matches[0];
         } else {
            return \'\';
         }'
    ),
    "I welcome you to San Diego"
); 
// output: welcome you San Diego 

If you're using PHP 5.3 or greater, you could employ an anonymous function rather than calling create_function.

Upvotes: 5

mario
mario

Reputation: 145482

You can embed your ignore list using a (?!..) negative assertion:

 preg_replace("/\b(?!San|you|not)\w{1,3}\b/", "", ...

Also I would just use \w instead of [^\s] so it really only matches words.

Upvotes: 9

Related Questions