Reputation: 8071
I would like to how can I remove long word from a string. Words greater than length n.
I tried the following:
//remove words which have more than 5 characters from string
$s = 'abba bbbbbbbbbbbb 1234567 zxcee ytytytytytytytyt zczc xyz';
echo preg_replace("~\s(.{5,})\s~isU", " ", $s);
Gives the Output (which is incorrect):
abba 1234567 ytytytytytytytyt zczc xyz
Upvotes: 4
Views: 1585
Reputation: 45589
<?php
//remove words which have more than 5 characters from string
$s = 'abba bbbbbbbbbbbb 1234567 zxcee ytytytytytytytyt zczc xyz';
$patterns = array(
'long_words' => '/[^\s]{5,}/',
'multiple_spaces' => '/\s{2,}/'
);
$replacements = array(
'long_words' => '',
'multiple_spaces' => ' '
);
echo trim(preg_replace($patterns, $replacements, $s));
?>
Output:
abba zczc xyz
Update, to address the issue you presented in the comments. You can do it like this:
<?php
//remove words which have more than 5 characters from string
$s = '123 ReallyLongStringComesHere 123';
$patterns = array(
'html_space' => '/ /',
'long_words' => '/[^\s]{5,}/',
'multiple_spaces' => '/\s{2,}/'
);
$replacements = array(
'html_space' => ' ',
'long_words' => '',
'multiple_spaces' => ' '
);
echo str_replace(' ', ' ', trim(preg_replace($patterns, $replacements, $s)));
?>
Output:
123 123
Upvotes: 2
Reputation: 96266
Summary:
\s
will fail to remove words at the beginning and the end of string (and you should use a test string which fails with these!)\b
doesn't fail like that but it won't remove whitespaces. you can combine that what a suggested double-space remover but that won't preserve original duplicated whitespaces (this may not be a problem).\b
ending with \s
and another one starting with \s
and ending with $
.Upvotes: 0
Reputation: 31564
You're close:
preg_replace("~\w{5,}~", "", $s);
Working codepad example: http://codepad.org/c5AN1E6M
Also, you'll want to collapse multiple spaces into one:
preg_replace("~ +~", " ", $s);
Upvotes: 0
Reputation: 2446
A better approach maybe to use regular string manipulation instead of a regex? A simple implode/explode and strlen will do nicely. Depending on the size of your string of course, but for your example it should be fine.
Upvotes: 1
Reputation: 56182
Use this regex: \b\w{5,}\b
. It will match long words.
\b
- word boundary\w{5,}
- alphanumeric 5
or more repetitions\b
- word boundaryUpvotes: 5