Reputation: 328
I am rather new to regex and am stuck on the following where I try to use preg_match_all
to count the number of hello after world.
If I use "world".+(hello)
, it counts to the in the last hello; "world".*?(hello)
stops in the first hello, both giving one count.
blah blah blah
hello
blah blah blah
class="world"
blah blah blah
hello
blah blah
hello
blah blah blah
hello
blah blah blah
I am expecting 3
as the count because the hello
before world
should not be counted.
Upvotes: 3
Views: 364
Reputation: 89557
Other way: force the pattern to fail and to not retry if world
doesn't exist in the string:
~(?:\A(*COMMIT).*?world)?.*?hello~s
The non-capturing group is optional but greedy. Consequence, it is tested each time the pattern is tried.
It begins with the \A
anchor that matches the start of the string, so this is the only position where this group can succeed. After the start of the string, at other positions \A
fails and since the group is optional, the remaining subpattern in it is ignored and the research continues with .*?hello
.
Immediately after, there's the backtracking control verb (*COMMIT)
that in case of failure after it, forces the pattern to not be retried at all. (end of the story).
In other words, if this group fails at the start of the string, the research is aborted once and for all.
Advantage: it needs less steps than a \G
based pattern.
To be more efficient, a \G
based pattern can also be written this way (using an optional group instead of an alternation):
~(?:\A.*?world)?(?!\A).*?hello~sA
Here the A modifier takes the role of the \G
anchor, but it's exactly the same than starting each branch of a pattern (only one here) with the \G
anchor.
Upvotes: 1
Reputation: 626754
You can use a single preg_match_all
call here:
$text = "blah blah blah\nhello\nblah blah blah\nclass=\"world\" \nblah blah blah\nhello \nblah blah\nhello\nblah blah blah\nhello\nblah blah blah";
echo preg_match_all('~(?:\G(?!^)|\bworld\b).*?\K\bhello\b~s', $text);
See the regex demo and the PHP demo. Details:
(?:\G(?!^)|\bworld\b)
- end of the previous match (\G(?!^)
does this check: \G
matches either start of the string or end of the previous match position, so we need to exclude the start of string position, and this is done with the (?!^)
negative lookahead) or a whole word world
.*?
- any zero or more chars, as few as possible\K
- discards all text matched so far\bhello\b
- a whole word hello
.NOTE: If you do not need word boundary check, you may remove \b
from the pattern.
If hello
and world
are user-defined patterns, you must preg_quote
them in the pattern:
$start = "world";
$find = "hello";
$text = "blah blah blah\nhello\nblah blah blah\nclass=\"world\" \nblah blah blah\nhello \nblah blah\nhello\nblah blah blah\nhello\nblah blah blah";
echo preg_match_all('~(?:\G(?!^)|' . preg_quote($start, '~') . '\b).*?\K' . preg_quote($find, '~') . '~s', $text);
Upvotes: 1
Reputation: 18490
Another option with simple regexes:
if(preg_match('/"world".*/s', $str, $out)) {
echo preg_match_all('/\bhello\b/', $out[0]);
}
Upvotes: 2
Reputation: 521168
One approach might be to first strip off the leading portion of the string up to, and including, the first occurrence of world
. Then call preg_match_all
as you already are doing and get the count of occurrences of hello
.
$input = "blah blah blah
hello
blah blah blah
class=\"world\"
blah blah blah
hello
blah blah
hello
blah blah blah
hello
blah blah blah";
$input = preg_replace("/^.*?\bworld/", "", $input);
preg_match_all("/\bhello\b/", $input, $matches);
echo sizeof($matches[0]); // 4
Upvotes: 0