Reputation: 1869
I am trying to capture one or more variables in a string, not just the first value found. I created a test regex
color.*?(?<COLOR>(red|blue|black)).*?.
and test sentence
favorite colors are red, blue and black. Mr. Green
which can be seen here http://regex101.com/r/vV7bP3/2
My goal is to get a match for each of red, blue an black AND not for Green which comes after the period. In other words looking for a match for all colors in a sentence with the word 'color' but only up to the next period (I understand this is a two-part question I thought it was easier than posting twice.
Upvotes: 4
Views: 28006
Reputation: 174706
You could try the below regex to capture the colors which are just before to the literal .
,
color[^\.]*(red|blue|black|Green)[^\.]*(red|blue|black|Green)[^\.]*(red|blue|black|Green)[^\.]*
Upvotes: 0
Reputation: 89557
To find several colors between the word color and a dot, you can use this pattern in a global search (compatible PHP/PCRE, Perl, Ruby 2.0, Java, .NET):
(?:\G(?!\A)|\bcolors?\b)[^.]+?\b(?<colors>red|bl(?:ue|ack))\b
The idea is to use the \G
anchor that matches the end of the last match result. This technic use two entry points at the begining of the pattern (in the non-capturing group).
The first match result uses the word "color" as entry point for the pattern, and the next match results use the entry point with the \G
anchor.
Since [^.]
is the only character class used, you can't obtain other results after the dot. (Note that this can cause problems with abbreviations like Mr.
or acronyms like U.S.A.
)
Note: You can reduce the work for the regex engine, by adding .*?
before "color" (this avoids to test all the characters before "color" one by one with the 2 entry points):
(?:\G(?!\A)|.*?\bcolors?\b)[^.]+?\b(?<colors>red|bl(?:ue|ack))\b
or you can move the word boundary at the begining to fail faster (since each match ends with a word boundary):
\b(?:\G(?!\A)|colors?\b)[^.]+?\b(?<colors>red|bl(?:ue|ack))\b
Upvotes: 7