Reputation: 73
I have a test string like this
08:28:57,990 DEBUG [http-0.0.0.0-18080-33] [tester] [1522412937602-580613] [TestManager] ABCD: loaded 35 test accounts
I want to regex and match "ABCD" and "35" in this string
def regexString = ~ /(\s\d{1,5}[^\d\]\-\:\,\.])|([A-Z]{4}\:)/
............
while (matcher.find()) {
acct = matcher.group(1)
grpName = matcher.group(2)
println ("group : " +grpName + " acct : "+ acct)
}
My Current Output is
group : ABCD: acct : null
group : null acct : 35
But I expected something like this
group : ABCD: acct : 35
Is there any option to match all the patterns in the string before it loops into the while(). Or a better way to implement this
Upvotes: 2
Views: 4223
Reputation: 626747
You may use
String s = "08:28:57,990 DEBUG [http-0.0.0.0-18080-33] [tester] [1522412937602-580613] [TestManager] ABCD: loaded 35 test accounts"
def res = s =~ /\b([A-Z]{4}):[^\]\[\d]*(\d{1,5})\b/
if (res.find()) {
println "${res[0][1]}, ${res[0][2]}"
} else {
println "not found"
}
See the Groovy demo.
The regex - \b([A-Z]{4}):[^\]\[\d]*(\d{1,5})\b
- matches a string starting with a whole word consisting of 4 uppercase ASCII letters (captured into Group 1), then followed with :
and 0+ chars other than [
, ]
and digits, and then matches and captures into Group 2 a whole number consisting of 1 to 4 digits.
See the regex demo.
In the code, =~
operator makes the regex engine find a partial match (i.e. searches for the pattern anywhere inside the string) and the res
variable contains all the match objects that hold a whole match inside res[0][0]
, Group 1 inside res[0][1]
and Group 2 value in res[0][2]
.
Upvotes: 1
Reputation: 110
I believe your issues is with the 'or' in your regex. I think it is essentially parsing it twice, once to match the first half of the regex and then again to match the second half after the '|'. You need a regex that will match both in one parse. You can reverse the matches so they match in order:
/([A-Z]{4})\:.*\s(\d{1,5)}[^\d\]-"\,\.]/
Also notice the change in parentheses so you don't capture more than you need - Currently you are capturing the ':' after the group name and an extra space before the acct. This is assuming the "ABCD" will always come before the "35".
There is also a lot more you can do assuming that all your strings are formatted very similarly:
For example, if there is always a space after the acct number you could simplify it to:
/([A-Z]{4})\:.*\s(\d{1,5)}\s/
There's probably a lot more you could do to make sure you're always capturing the correct things, but i'd have to see or know more about the dataset to do so.
Then of course you have the switch the order of matches in your code:
while (matcher.find()) {
grpName = matcher.group(1)
acct = matcher.group(2)
println ("group : " +grpName + " acct : "+ acct)
}
Upvotes: 0