itnovice
itnovice

Reputation: 513

Posix regex capture group matching sequence

I have the following text string and regex pattern in a c program:

char text[] = "        identification     division. "; 
char pattern[] = "^(.*)(identification *division)(.*)$"; 

Using regexec() library function, I got the following results:

String:         identification     division. 
Pattern: ^(.*)(identification *division)(.*)$ 
Total number of subexpressions: 3 

OK, pattern has matched  ... 

begin: 0, end: 37,match:         identification     division. 
subexpression 1 begin: 0, end: 8, match: 
subexpression 2 begin: 8, end: 35, match: identification     division 
subexpression 3 begin: 35, end: 37, match: . 

I was wondering since the regex engine matches in a greedy fashion and the first capture group (.*) matches any number of characters (except new line characters) why doesn't it match characters all the way to the end in the text string (up to '.') as oppose to matching only the first 8 spaces?

Does each capture group have to be matched?

Are there any rules on how the capture group matches the text string?

Thanks.

Upvotes: 0

Views: 1190

Answers (2)

Kjell Andreassen
Kjell Andreassen

Reputation: 763

Just as you said, if the greedy group (.*) had consumed the whole string, the rest of the regex wouldn't have anything to match which wouldn't make your regex match the string. So, yes, each capture group (and other pattern parts) needs to be matched. This is exactly what you specified in your regex.

Try the following string instead and run the code with both a reluctant and a greedy first group and you will see the difference.

char text[] = "    identification  division    identification     division. ";

Upvotes: 0

Dave
Dave

Reputation: 11162

Regexes are as greedy as possible, without being too greedy. Had the left group been as greedy as you expect, the group that matches "identification division" would have been unable to match, erronously rejecting text, which was clearly in the language.

Upvotes: 1

Related Questions