Reputation: 1725
If I have the following pattern in some text:
def articleContent = "<![CDATA[ Hellow World ]]>"
I would like to extract the "Hellow World" part, so I use the following code to match it:
def contentRegex = "<![CDATA[ /(.)*/ ]]>"
def contentMatcher = ( articleContent =~ contentRegex )
println contentMatcher[0]
However I keep getting a null pointer exception because the regex doesn't seem to be working, what would be the correct regex for "any peace of text", and how to collect it from a string?
Upvotes: 50
Views: 91181
Reputation: 387
In my case, the actual string was multi-line like below
ID : AB-223
Product : Standard Profile
Start Date : 2020-11-19 00:00:00
Subscription : Annual
Volume : 11
Page URL : null
Commitment : 1200.00
Start Date : 2020-11-25 00:00:00
I wanted to extract the Start Date
value from this string so here is how my script looks like
def matches = (originalData =~ /(?<=Actual Start Date :).*/)
def extractedData = matches[0]
This regex extracts the string content from each line which has a prefix matching Start Date :
In my case, the result is is 2020-11-25 00:00:00
Note : If your originalData
is a multi-line string then in groovy you can include it as follows
def originalData =
"""
ID : AB-223
Product : Standard Profile
Start Date : 2020-11-19 00:00:00
Subscription : Annual
Volume : 11
Page URL : null
Commitment : 1200.00
Start Date : 2020-11-25 00:00:00
"""
This script looks simple but took me some good time to figure out few things so I'm posting this here.
Upvotes: 0
Reputation: 1434
One more sinle-line solution additional to tim_yates's one
def result = articleContent.replaceAll(/<!\[CDATA\[(.+)]]>/,/$1/)
Please, take into account that in case of regexp doesn't match then result will be equal to the source. Unlikely in case of
def result = (articleContent =~ /<!\[CDATA\[(.+)]]>/)[0][1]
it will raise an exception.
Upvotes: 2
Reputation: 143
A little bit late to the party but try using backslash when defining your pattern, example:
def articleContent = "real groovy"
def matches = (articleContent =~ /gr\w{4}/) //grabs 'gr' and its following 4 chars
def firstmatch = matches[0] //firstmatch would be 'groovy'
you were on the right track, it was just the pattern definition that needed to be altered.
References:
https://www.regular-expressions.info/groovy.html
http://mrhaki.blogspot.com/2009/09/groovy-goodness-matchers-for-regular.html
Upvotes: 2
Reputation: 1531
The code below shows the substring extraction using regex in groovy:
class StringHelper {
@NonCPS
static String stripSshPrefix(String gitUrl){
def match = (gitUrl =~ /ssh:\/\/(.+)/)
if (match.find()) {
return match.group(1)
}
return gitUrl
}
static void main(String... args) {
def gitUrl = "ssh://[email protected]:jiahut/boot.git"
def gitUrl2 = "[email protected]:jiahut/boot.git"
println(stripSshPrefix(gitUrl))
println(stripSshPrefix(gitUrl2))
}
}
Upvotes: 11
Reputation: 171194
Try:
def result = (articleContent =~ /<!\[CDATA\[(.+)]]>/)[ 0 ][ 1 ]
However I worry that you are planning to parse xml with regular expressions. If this cdata is part of a larger valid xml document, better to use an xml parser
Upvotes: 76