Reputation: 31
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE ... ]>
<abc-config version="THIS" id="abc">
...
</abc-config>
Hi all,
In the code above, how can I extract the value of version attribute using Regex in Groovy/Java?
Thanks.
Upvotes: 3
Views: 3435
Reputation: 171114
I know you asked for a regex, but what's wrong with this in Groovy?
Assuming the xml is something like:
def xml= '''<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE abc-config>
<abc-config version="THIS" id="abc">
<node></node>
</abc-config>'''
Then I can parse it with:
def n = new XmlSlurper().parseText( xml )
And then this line:
println n.@version
Prints out "THIS"
If you are having problems with a more complex DOCTYPE failing to load, you can try disabling the DOCTYPE checker by either:
def parser = new XmlSlurper()
parser.setFeature( "http://apache.org/xml/features/nonvalidating/load-external-dtd", false )
parser.setFeature( "http://xml.org/sax/features/namespaces", false )
parser.parseText( xml )
or by using the constructor for XmlSlurper that takes 2 parameters so as to disable this checking
Upvotes: 2
Reputation:
Not a java regex, Perl regex...
/<\w+\s+[^>]*?(?<=\s)version\s*=\s*["'](.+?)["'][^>]*?\s*\/?>/sg
Note that this fails on many levels, I could fill the page with a proper regex, but I don't have the desire.
this fails too ...
/<\w+\s+[^>]*?(?<=\s)version\s*=\s*(".+?"|'.+?')[^>]*?\s*\/?>/sg
so does this
/<\w+\s+[^>]*?(?<=\s)version\s*=\s*(["'])(.+?)\1[^>]*?\s*\/?>/sg
Upvotes: 0
Reputation: 35808
A regex to handle this could be something like:
/<\?xml version="([0-9.]+)"/
I'll spare you one of the 10000 lectures about not using a regex to parse markup languages.
Edit: The One whose Name cannot be expressed in the Basic Multilingual Plane, He compelled me.
Upvotes: 2