Reputation: 2848
I have to extract with regEx first element name in the xml (ignoring optional namespace prefix.
Here is the sample XML1:
<ns1:Monkey xmlns="http://myurlisrighthereheremonkey.com/monkeynamespace">
<foodType>
<vegtables>
<carrots>1</carrots>
</vegtables>
<foodType>
</ns1:Monkey>
And here is similar XML that is without namespace, XML2:
<Monkey xmlns="http://myurlisrighthereheremonkey.com/monkeynamespace">
<foodType>
<vegtables>
<carrots>1</carrots>
</vegtables>
<foodType>
</Monkey>
I need a regEx that will return me "Monkey" for either XML1 or XML2
So far I tried HERE this regEx <(\w+:)(\w+) that works for XML1 .... but I don't know how to make it work for XML2
Upvotes: 1
Views: 3103
Reputation: 627409
Since it seems to be a one-time job and you really do not have access to XML parser, you can use either of the 2 regexps (that will work only for the XML files like you provided as samples):
<(\w+:)?(\w+)(?=\s*xmlns="http://myurlisrighthereheremonkey\.com/monkeynamespace")
Or (if you check the whole single file contents with the regex):
^\s*<(\w+:)?(\w+)
The main changes are 2:
(\w+:)?
- adding ?
modifier makes the first capturing group optional^\s*
makes the regex match at the beginning of the string (guess you do not have XML declaration there), or (?=\s*xmlns="http://myurlisrighthereheremonkey.com/monkeynamespace")
look-ahead forcing the match only if followed by optional spaces and literal xmlns="http://myurlisrighthereheremonkey.com/monkeynamespace"
.However, you really need to think about changing to code supporting XML parsing, it will make your life and lives of those who will be in charge of maintaining code easier.
Upvotes: 2