Reputation: 348
I'd like to recognise a specific pattern from large text chunk, I'll be using C#.NET regex lib.
i.e.
1. This camera support Monochrome, Neutral, Standard, Landscape and Portrait [...More words...] settings furnish advanced, personalized color control.
Output shall be: Array ["Monocrome", "Neutral", "Standard", "Landscape", "Portrait"]
It should also avoid "advance" as , is followed by word.
I'm currently using expression (([\S]+)( {0,3})?(,|and))
which returns me all words till and. Can you suggest me expression that covers word after and?
Cheers! Nilay
Upvotes: 0
Views: 545
Reputation: 2820
Matching the list isn't too hard, but getting it into the list right is harder, and I suspect the mechanisms I'd use in perl are language dependent (I don't use microsoft products, so I won't give it to you in C#).
In perl, I'd do it something like the following. It's not a single regex answer, but I think the code is clearer for that.
$string = "This camera support Monochrome, Neutral, Standard, Landscape and Portrait foo bar baz";
$re_sep = "(?: {0,3}, {0,3}| {1,3}and {1,3})";
$re_list = "\w+(?:$re_sep\w+)+";
($list) = $string =~ m/($re_list)/;
@list_elements = split /$re_sep/, $list;
Upvotes: 0
Reputation: 348
Found the correct answer using lookaround
The problem: Regex cursor will be on advance reference when comparing ahead i.e
Monochrome, Neutral, Standard, Landscape and Portrait
consider and
to be part of capture than the that word won't be available for next capture and therefore it will not capture Portrait. The right approach would be to use lookaround forward and backward.
(?=( {0,1})?(,|and)))
is the correct forward lookahead, and (?<=( {1,3}(and|or) {1,3}))
is correct backward lookbehind.
Upvotes: 0