Reputation: 61
I need to extract part of an xml data available in stdin using shell script.
Input data is pasted below.
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
<soapenv:Header>
<ns7:ClientInfoHeader xmlns:ns7="urn:messages.test.example.com/v1" soapenv:mustUnderstand="0">
<ns7:AppID>example</ns7:AppID>
</ns7:ClientInfoHeader>
<wsse:Security xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd" mustUnderstand="1">
<wsse:UsernameToken>
<wsse:Username>testuser</wsse:Username>
</wsse:UsernameToken>
</wsse:Security>
</soapenv:Header>
<soapenv:Body>
<ns7:CSV xmlns:ns7="urn:messages.test.example.com/v1">
<ns7:Que>SELECT * from Test</ns7:Qu>
</ns7:CSV>
</soapenv:Body>
</soapenv:Envelope>
I need to extract the namespace version v1
from the above input. It means v1
from
"urn:messages.test.example.com/v1"
I can only use Sed utility.
Your help is much appreciated
Upvotes: 2
Views: 181
Reputation: 14291
Note that parsing XML and other recursive data with regexen is often a bad idea and a proper parser the better solution. (For example: what if your search string occurs somewhere you didn't expect it, like in a comment or as part of a string?) If you're not aware of this, look it up.
One possibility to extract all versions after xmlns:ns7="urn.messages.test.example.com/
, assuming the version format is always v
followed by a number:
sed -rne 's/.*xmlns:ns7="urn:messages\.test\.example\.com\/(v[0-9]+)".*/\1/p' input.xml
If you only need the first match:
sed -rne '/.*xmlns:ns7="urn:messages\.test\.example\.com\/(v[0-9]+)".*/{s//\1/p;q;}' input.xml
Upvotes: 1