Reputation: 7094
I have the following text in $text
:
$text = 'Hello world, lorem ipsum.
What?
Hello world, lorem ipsum what.
Excuse me!';
If the words on a line are less than 3 words, then I want to remove that line completely. So the lines with What?
and Excuse me!
should be removed from the string.
Is there a regex approach or how do I go about this?
Upvotes: 3
Views: 74
Reputation: 785316
You can use this negative lookahead regex:
preg_replace('/^(?!(?:\h*\S+\h+){2}\S+).*\R*/m', '', $text);
Output:
Hello world, lorem ipsum.
Hello world, lorem ipsum what.
(?!(?:\S+\h+){3})
will match any line that doesn't have 3 non-space words. \R
matches a newline character in PHP regex.
Without lookahead Use preg_grep
:
echo implode("\n", preg_grep('/^\h*(?:\S+\h+){2}\S+/', explode("\n", $text)));
Hello world, lorem ipsum.
Hello world, lorem ipsum what.
Upvotes: 2
Reputation: 417
I came up with this. Avoiding regex when possible is my preference, as regex tends to slow things down.
$str = 'Hello world, lorem ipsum.
What?
Hello world, lorem ipsum what.';
$new_str = explode("\n", $str);
foreach ($new_str as $keys => &$lines) {
$lines = trim($lines);
if (substr_count($lines, " ") < 2) {
unset($new_str[$keys]);
}
}
$new_str = implode("\n", $new_str);
print_r($new_str);
Which prints out this:
Hello world, lorem ipsum.
Hello world, lorem ipsum what.
Upvotes: 3
Reputation: 350365
You could use this regular expression in preg_replace
:
$test = preg_replace("/^(?!\h*\S+\h+\S+\h+\S+).*$\R?/m", "", $text);
Testing with input that touches on some additional boundary conditions:
$text = 'Hello world, lorem ipsum.
What? ending-spaces
Hello world, lorem
Hello world, lorem ipsum what.
ending text';
$test = preg_replace("/^(?!\h*\S+\h+\S+\h+\S+).*$\R?/m", '', $text);
echo $test;
Output:
Hello world, lorem ipsum.
Hello world, lorem
Hello world, lorem ipsum what.
The (?!
part looks ahead to see if -- after some optional horiontal blanks (\h*
) -- there are three words (\S+
)
separated by (horizontal) blanks (\h+
), and if so, does not match (so the line is not removed).
In all other cases the .*$
will match anything until the end of the line, including the line-break (\R
) if present (?
)
and will be replaced by an empty string, in order to remove that line.
The m
modifier will make ^
and $
match with the beginning and end of a line respectively
(instead of beginning and end of complete string).
Here is a fiddle using the above input and regex.
Upvotes: 1