user1697575
user1697575

Reputation: 2848

RegEx to extract first XML element name with optional namespace prefix

I have to extract with regEx first element name in the xml (ignoring optional namespace prefix.

Here is the sample XML1:

<ns1:Monkey xmlns="http://myurlisrighthereheremonkey.com/monkeynamespace">
 <foodType>
  <vegtables>
   <carrots>1</carrots>
  </vegtables>
 <foodType>   
</ns1:Monkey>

And here is similar XML that is without namespace, XML2:

 <Monkey xmlns="http://myurlisrighthereheremonkey.com/monkeynamespace">
 <foodType>
  <vegtables>
   <carrots>1</carrots>
  </vegtables>
 <foodType>   
</Monkey>

I need a regEx that will return me "Monkey" for either XML1 or XML2

So far I tried HERE this regEx <(\w+:)(\w+) that works for XML1 .... but I don't know how to make it work for XML2

Upvotes: 1

Views: 3103

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627409

Since it seems to be a one-time job and you really do not have access to XML parser, you can use either of the 2 regexps (that will work only for the XML files like you provided as samples):

<(\w+:)?(\w+)(?=\s*xmlns="http://myurlisrighthereheremonkey\.com/monkeynamespace")

Demo 1

Or (if you check the whole single file contents with the regex):

^\s*<(\w+:)?(\w+)

Demo 2

The main changes are 2:

  • (\w+:)? - adding ? modifier makes the first capturing group optional
  • ^\s* makes the regex match at the beginning of the string (guess you do not have XML declaration there), or (?=\s*xmlns="http://myurlisrighthereheremonkey.com/monkeynamespace") look-ahead forcing the match only if followed by optional spaces and literal xmlns="http://myurlisrighthereheremonkey.com/monkeynamespace".

However, you really need to think about changing to code supporting XML parsing, it will make your life and lives of those who will be in charge of maintaining code easier.

Upvotes: 2

Related Questions