user51462
user51462

Reputation: 2022

Nesting capture groups

I have the following strings:

'TwoOrMoreDimensions'
'LookLikeVectors'
'RecentVersions'
'= getColSums'
'=getColSums'

I would like to capture all occurrences of an uppercase letter that is preceded by a lowercase letter in all strings but the last two.

I can use ([a-z]+)([A-Z]) to capture all such occurrences but I don't know how to exclude matches from the last two strings.

The last two strings can be excluded using the negative lookahead ^(?!>\s|\=) - is it possible to combine this with the expression above?

I tried ^(?!>\s|\=)(([a-z]+)([A-Z])) but it doesn't yield any matches. I'm not sure why because ^(?!>\s|\=)(.+) captures all characters after the start of the matching string as a group. So why can't this capture group be further divided into group 2 ([a-z]+) and group 3 ([A-Z])?

Link to tester

Upvotes: 0

Views: 43

Answers (2)

User 10482
User 10482

Reputation: 1002

Another solution (may not be the most efficient but meets the task) would be (?:^=\s*\w*)|([a-z]+)([A-Z])

This essentially forces the regex to greedily consume everything (in a non-capturing group, although is considered for full match) if it begins with =, leaving nothing for the next capture groups.

Regex101 Demo Link

Upvotes: 1

Nick
Nick

Reputation: 147146

The issue with your current regex is that the ^ anchors it to the start of string, so it can only match a sequence of lower case letters followed by an upper case letter at the start of the string, and none of your strings have that.

One way to do what you want is to use the \G anchor, which forces the current match to start where the previous one ended. That can be used in an alternation with ^(?!=) which will match any string which doesn't start with an = sign, and then a negated character class ([^a-z]) to skip any non-lower case characters:

(?:^(?!=)|\G)[^a-z]*(([a-z]+)([A-Z]))

This will give the same capture groups as your original regex.

Demo on regex101

Upvotes: 1

Related Questions