Jils
Jils

Reputation: 783

Groovy - multi capturing group

I want to extract from a string a couple of words whenever they are present. I tried something but the result is not what I expect.

def myString = 'blablabla bla bla blabla New User: John_user function: auditor blablablablabl bla blab New User: Do_user function: auditor2 blablabl blablb...'
def m = myString =~ /\sNew User:\s(.+_user)\sfunction:\s(auditor|auditor2)\s/

I want to have something like:

println m[0][1] //John_user
println m[0][2] //auditor

println m[1][1] //Do_user
println m[1][2] //auditor2

....

Upvotes: 1

Views: 99

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627545

You need to replace .+ with [^_]:

\sNew User:\s([^_]+_user)\sfunction:\s(auditor|auditor2)\s
              ^^^^^

See regex demo

The reason why your regex overfires is that .+ grabs the whole string and then starts backtracking, looking for a valid match to return. It find the last occurrence of _user, and puts the whole John_user function: auditor blablablablabl bla blab New User: Do_user into the first capturing group - which we can avoid if we restrict the pattern to only search for all characters but an underscore (with [^_]).

You can achieve a similar result with .+?, but a negated character class is more efficient. However, if your user names can contain _ symbols, you will have to use this construct and the regex will look like

\sNew User:\s(.+?_user)\sfunction:\s(auditor|auditor2)\s

See this demo

Upvotes: 1

Related Questions