bigb055
bigb055

Reputation: 308

Regex: Only the first match of word after a given word is returned

I have a test string "apple search from here apple, banana, apple." and the following RegEx (?i)(?<=search from here\s)(\bapple|banana|orange\b)(\s+(\bapple|banana|orange\b))* I'm getting a match only for the first occurrence of apple. See https://regex101.com/r/EQin6O/1 How do I get matches for each occurrence of apple after the "search from here" text?

Upvotes: 1

Views: 103

Answers (2)

VicenteJankowski
VicenteJankowski

Reputation: 117

That should do the job:

(?:\G(?!\A)|search from here ).*?\K(apple|banana|orange)

See this https://regex101.com/r/q3FGoD/1

Step by step:

  • \G - asserts we are at the beginning of the previous match or start of the string
  • (?!\A) - negative lookahead for the start of the String - that help us to omit start of the String in \G
  • |search from here - alternatively look for string search from here - that provides us the first match
  • .*? - allows for any characters in between the search from here and a captured group (apple|banana|orange)
  • \K omit previous matches
  • (apple|banana|orange) - eventually captures the matches matching alternatively one of given words

Upvotes: 1

Steve4585
Steve4585

Reputation: 71

The final solution involves two separate regex searches, see below.

Originally, you had only 1 match, because there is only one "apple" that is immediately preceded by "search from here ". Furthermore, the rest of the original pattern is matched zero times, since a comma follows the apple not a space. Thus you had 1 match with 1 group.

One possibility is to make use of capture groups. If you insert a comma before \s+, so that the comma in the pattern absorbs the comma in the subject string, then you will get the second apple in the last capture group. I would also insert ?: before the comma to avoid unnecessary capturing:

(?i)(?<=search from here\s)(apple|banana|orange)(?:,\s+(apple|banana|orange))*

Now we have 1 match for the whole list, and 2 groups with apples. Note, however, that repeated capture groups store only the last match, so "banana" will not be stored. Although it is matched by group 2, it is later overwritten by the last "apple". We could rewrite the pattern omitting the quantifier *:

(?i)(?<=search from here\s)(apple|banana|orange)(?:,\s+(\g'1'))?(?:,\s+(\g'1'))?(?:,\s+(\g'1'))?(?:,\s+(\g'1'))?

To avoid code repetition, \g'1' is used to represent the same expression as given in the 1st capture group (i.e., "apple|banana|orange"). Now you have 1 match with (here up to 5) groups for all the fruits. But still not multiple matches.

If you want multiple matches, one for each fruit that is somewhere (not necessarily immediately) preceded by "search from here", that would need a variable length look-behind assertion, which is not allowed. I would rather suggest to split the problem in two separate regex searches:

  1. The pattern (?i)(?<=search from here\s).* matches the interesting second half of the test text: "apple, banana, apple."
  2. The pattern (?i)\b(?:apple|banana|orange)\b with the g (global) modifier applied on the result of step 1 will yield "apple", "banana", and "apple".

MWE in PHP:

<?php
$subject = 'apple search from here apple, banana, apple.';
preg_match('/(?<=search from here\s).*/i', $subject, $new_subjects)
  and preg_match_all('/\b(?:apple|banana|orange)\b/i', $new_subjects[0], $result)
  and var_dump($result);

MWE in javascript:

subject = "apple search from here apple, banana, apple.";
new_subjects = subject.match(/(?<=search from here\s).*/i);
result = new_subjects[0].match(/\b(?:apple|banana|orange)\b/ig);
console.log(result);

Upvotes: 0

Related Questions