limestreetlab
limestreetlab

Reputation: 328

Count occurrences of specific word after a different, specific word is found

I am rather new to regex and am stuck on the following where I try to use preg_match_all to count the number of hello after world.

If I use "world".+(hello), it counts to the in the last hello; "world".*?(hello) stops in the first hello, both giving one count.

blah blah blah
hello
blah blah blah
class="world" 
blah blah blah
hello 
blah blah
hello
blah blah blah
hello
blah blah blah

I am expecting 3 as the count because the hello before world should not be counted.

Upvotes: 3

Views: 364

Answers (4)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

Other way: force the pattern to fail and to not retry if world doesn't exist in the string:

~(?:\A(*COMMIT).*?world)?.*?hello~s

demo

The non-capturing group is optional but greedy. Consequence, it is tested each time the pattern is tried.
It begins with the \A anchor that matches the start of the string, so this is the only position where this group can succeed. After the start of the string, at other positions \A fails and since the group is optional, the remaining subpattern in it is ignored and the research continues with .*?hello.
Immediately after, there's the backtracking control verb (*COMMIT) that in case of failure after it, forces the pattern to not be retried at all. (end of the story).

In other words, if this group fails at the start of the string, the research is aborted once and for all.

Advantage: it needs less steps than a \G based pattern.


To be more efficient, a \G based pattern can also be written this way (using an optional group instead of an alternation):

~(?:\A.*?world)?(?!\A).*?hello~sA

Here the A modifier takes the role of the \G anchor, but it's exactly the same than starting each branch of a pattern (only one here) with the \G anchor.

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626754

You can use a single preg_match_all call here:

$text = "blah blah blah\nhello\nblah blah blah\nclass=\"world\" \nblah blah blah\nhello \nblah blah\nhello\nblah blah blah\nhello\nblah blah blah";
echo preg_match_all('~(?:\G(?!^)|\bworld\b).*?\K\bhello\b~s', $text);

See the regex demo and the PHP demo. Details:

  • (?:\G(?!^)|\bworld\b) - end of the previous match (\G(?!^) does this check: \G matches either start of the string or end of the previous match position, so we need to exclude the start of string position, and this is done with the (?!^) negative lookahead) or a whole word world
  • .*? - any zero or more chars, as few as possible
  • \K - discards all text matched so far
  • \bhello\b - a whole word hello.

NOTE: If you do not need word boundary check, you may remove \b from the pattern.

If hello and world are user-defined patterns, you must preg_quote them in the pattern:

$start = "world";
$find = "hello";
$text = "blah blah blah\nhello\nblah blah blah\nclass=\"world\" \nblah blah blah\nhello \nblah blah\nhello\nblah blah blah\nhello\nblah blah blah";
echo preg_match_all('~(?:\G(?!^)|' . preg_quote($start, '~') . '\b).*?\K' . preg_quote($find, '~') . '~s', $text);

Upvotes: 1

bobble bubble
bobble bubble

Reputation: 18490

Another option with simple regexes:

if(preg_match('/"world".*/s', $str, $out)) {
  echo preg_match_all('/\bhello\b/', $out[0]);
}

See demo at tio.run

Upvotes: 2

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521168

One approach might be to first strip off the leading portion of the string up to, and including, the first occurrence of world. Then call preg_match_all as you already are doing and get the count of occurrences of hello.

$input = "blah blah blah
hello
blah blah blah
class=\"world\" 
blah blah blah
hello 
blah blah
hello
blah blah blah
hello
blah blah blah";

$input = preg_replace("/^.*?\bworld/", "", $input);
preg_match_all("/\bhello\b/", $input, $matches);
echo sizeof($matches[0]);  // 4

Upvotes: 0

Related Questions