Imran Omar Bukhsh
Imran Omar Bukhsh

Reputation: 8071

Removing long words regex

I would like to how can I remove long word from a string. Words greater than length n.

I tried the following:

//remove words which have more than 5 characters from string
$s = 'abba bbbbbbbbbbbb 1234567 zxcee ytytytytytytytyt zczc xyz';
echo preg_replace("~\s(.{5,})\s~isU", " ", $s);

Gives the Output (which is incorrect):

abba 1234567 ytytytytytytytyt zczc xyz

Upvotes: 4

Views: 1585

Answers (6)

Shef
Shef

Reputation: 45589

<?php
//remove words which have more than 5 characters from string
$s = 'abba bbbbbbbbbbbb 1234567 zxcee ytytytytytytytyt zczc xyz';

$patterns = array(
    'long_words' => '/[^\s]{5,}/',
    'multiple_spaces' => '/\s{2,}/'
);

$replacements = array(
    'long_words' => '',
    'multiple_spaces' => ' '
);
echo trim(preg_replace($patterns, $replacements, $s));
?>

Output:

abba zczc xyz

Update, to address the issue you presented in the comments. You can do it like this:

<?php
//remove words which have more than 5 characters from string
$s = '123&nbsp;ReallyLongStringComesHere&nbsp;123';

$patterns = array(
    'html_space' => '/&nbsp;/',
    'long_words' => '/[^\s]{5,}/',
    'multiple_spaces' => '/\s{2,}/'
);

$replacements = array(
    'html_space' => ' ',
    'long_words' => '',
    'multiple_spaces' => ' '
);
echo str_replace(' ', '&nbsp;', trim(preg_replace($patterns, $replacements, $s)));
?>

Output:

123&nbsp;123

Upvotes: 2

Karoly Horvath
Karoly Horvath

Reputation: 96266

Summary:

  • any answer starting or ending with \s will fail to remove words at the beginning and the end of string (and you should use a test string which fails with these!)
  • \b doesn't fail like that but it won't remove whitespaces. you can combine that what a suggested double-space remover but that won't preserve original duplicated whitespaces (this may not be a problem).
  • explode+implode has a nice property that it preserves duplicated whitespaces but you have to do it for every whitespace character.
  • an alternative for whitespace-preserving (which I haven't seen here) is to use two patterns, one starting with \b ending with \s and another one starting with \s and ending with $.

Upvotes: 0

monsieur_h
monsieur_h

Reputation: 1380

Add the global modifier g or use preg_match_all().

Upvotes: 0

Gabi Purcaru
Gabi Purcaru

Reputation: 31564

You're close:

preg_replace("~\w{5,}~", "", $s);

Working codepad example: http://codepad.org/c5AN1E6M

Also, you'll want to collapse multiple spaces into one:

preg_replace("~ +~", " ", $s);

Example for this one

Upvotes: 0

Edgar Velasquez Lim
Edgar Velasquez Lim

Reputation: 2446

A better approach maybe to use regular string manipulation instead of a regex? A simple implode/explode and strlen will do nicely. Depending on the size of your string of course, but for your example it should be fine.

Upvotes: 1

Kirill Polishchuk
Kirill Polishchuk

Reputation: 56182

Use this regex: \b\w{5,}\b. It will match long words.

  1. \b - word boundary
  2. \w{5,} - alphanumeric 5 or more repetitions
  3. \b - word boundary

Upvotes: 5

Related Questions