user3649739
user3649739

Reputation: 1869

Match multiple values in a regex string

I am trying to capture one or more variables in a string, not just the first value found. I created a test regex

color.*?(?<COLOR>(red|blue|black)).*?.

and test sentence

favorite colors are red, blue and black.  Mr. Green

which can be seen here http://regex101.com/r/vV7bP3/2

My goal is to get a match for each of red, blue an black AND not for Green which comes after the period. In other words looking for a match for all colors in a sentence with the word 'color' but only up to the next period (I understand this is a two-part question I thought it was easier than posting twice.

Upvotes: 4

Views: 28006

Answers (2)

Avinash Raj
Avinash Raj

Reputation: 174706

You could try the below regex to capture the colors which are just before to the literal .,

color[^\.]*(red|blue|black|Green)[^\.]*(red|blue|black|Green)[^\.]*(red|blue|black|Green)[^\.]*

DEMO

Upvotes: 0

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

To find several colors between the word color and a dot, you can use this pattern in a global search (compatible PHP/PCRE, Perl, Ruby 2.0, Java, .NET):

(?:\G(?!\A)|\bcolors?\b)[^.]+?\b(?<colors>red|bl(?:ue|ack))\b

The idea is to use the \G anchor that matches the end of the last match result. This technic use two entry points at the begining of the pattern (in the non-capturing group).

The first match result uses the word "color" as entry point for the pattern, and the next match results use the entry point with the \G anchor.

Since [^.] is the only character class used, you can't obtain other results after the dot. (Note that this can cause problems with abbreviations like Mr. or acronyms like U.S.A.)

Note: You can reduce the work for the regex engine, by adding .*? before "color" (this avoids to test all the characters before "color" one by one with the 2 entry points):

(?:\G(?!\A)|.*?\bcolors?\b)[^.]+?\b(?<colors>red|bl(?:ue|ack))\b

or you can move the word boundary at the begining to fail faster (since each match ends with a word boundary):

\b(?:\G(?!\A)|colors?\b)[^.]+?\b(?<colors>red|bl(?:ue|ack))\b

Upvotes: 7

Related Questions