Nilay Parikh
Nilay Parikh

Reputation: 348

Regex - recognize sentence pattern

I'd like to recognise a specific pattern from large text chunk, I'll be using C#.NET regex lib.

i.e.

1. This camera support Monochrome, Neutral, Standard, Landscape and Portrait [...More words...] settings furnish advanced, personalized color control.
Output shall be: Array ["Monocrome", "Neutral", "Standard", "Landscape", "Portrait"]

It should also avoid "advance" as , is followed by word.

I'm currently using expression (([\S]+)( {0,3})?(,|and)) which returns me all words till and. Can you suggest me expression that covers word after and?

Cheers! Nilay

Upvotes: 0

Views: 545

Answers (3)

mc0e
mc0e

Reputation: 2820

Matching the list isn't too hard, but getting it into the list right is harder, and I suspect the mechanisms I'd use in perl are language dependent (I don't use microsoft products, so I won't give it to you in C#).

In perl, I'd do it something like the following. It's not a single regex answer, but I think the code is clearer for that.

$string = "This camera support Monochrome, Neutral, Standard, Landscape and Portrait foo bar baz";

$re_sep = "(?: {0,3}, {0,3}| {1,3}and {1,3})";
$re_list = "\w+(?:$re_sep\w+)+";

($list) = $string =~ m/($re_list)/;
@list_elements =  split /$re_sep/, $list;

Upvotes: 0

Nilay Parikh
Nilay Parikh

Reputation: 348

Found the correct answer using lookaround

The problem: Regex cursor will be on advance reference when comparing ahead i.e Monochrome, Neutral, Standard, Landscape and Portrait consider and to be part of capture than the that word won't be available for next capture and therefore it will not capture Portrait. The right approach would be to use lookaround forward and backward.

(?=( {0,1})?(,|and))) is the correct forward lookahead, and (?<=( {1,3}(and|or) {1,3})) is correct backward lookbehind.

Upvotes: 0

user23031988
user23031988

Reputation: 330

Have you tried:

 (([\S]+)( {0,3})?(,|and|\.))

http://regexr.com?355ci

Upvotes: 2

Related Questions