JInu Thomas
JInu Thomas

Reputation: 61

Extract part of an xml tag using Sed

I need to extract part of an xml data available in stdin using shell script.

Input data is pasted below.

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
    <soapenv:Header>
           <ns7:ClientInfoHeader xmlns:ns7="urn:messages.test.example.com/v1" soapenv:mustUnderstand="0">
             <ns7:AppID>example</ns7:AppID>
        </ns7:ClientInfoHeader>
        <wsse:Security xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd" mustUnderstand="1">
          <wsse:UsernameToken>
              <wsse:Username>testuser</wsse:Username>
          </wsse:UsernameToken>
        </wsse:Security>
    </soapenv:Header>
  <soapenv:Body>
        <ns7:CSV xmlns:ns7="urn:messages.test.example.com/v1">
                    <ns7:Que>SELECT * from Test</ns7:Qu>
        </ns7:CSV>
     </soapenv:Body>
</soapenv:Envelope>

I need to extract the namespace version v1 from the above input. It means v1 from

"urn:messages.test.example.com/v1"

I can only use Sed utility.

Your help is much appreciated

Upvotes: 2

Views: 181

Answers (1)

danlei
danlei

Reputation: 14291

Note that parsing XML and other recursive data with regexen is often a bad idea and a proper parser the better solution. (For example: what if your search string occurs somewhere you didn't expect it, like in a comment or as part of a string?) If you're not aware of this, look it up.

One possibility to extract all versions after xmlns:ns7="urn.messages.test.example.com/, assuming the version format is always v followed by a number:

sed -rne 's/.*xmlns:ns7="urn:messages\.test\.example\.com\/(v[0-9]+)".*/\1/p' input.xml

If you only need the first match:

sed -rne '/.*xmlns:ns7="urn:messages\.test\.example\.com\/(v[0-9]+)".*/{s//\1/p;q;}' input.xml

Upvotes: 1

Related Questions