Reputation: 25
I'm newbie with great editor called - sed.
I want to delete all the xml tags and extract string between specific tag - reportBody
Here how is it looks like in a single line:
<?xml version="1.0" ?><SOAP- ENV:Envelope xmlns:SOAP-ENV="blablah"><SOAP-ENV:Body> <getReportResponsexmlns:msgns="blahblahblah" xmlns="blahblah"><returnxmlns=""> <returnCode><majorReturnCode>000</majorReturnCode><minorReturnCode>0000</minorReturnCode><returnCode><reportName>blahblah</reportName><reportTitle>blahblahblahr</reportTitle><reportBody>STRING TO EXTRACT</reportBody><reportMimeType>text/csv</reportMimeType></return></getReportResponse></SOAP-ENV:Body></SOAP-ENV:Envelope>
The problem is that xml file CAN be different, sometimes it's written in a single line either written in 2-3 lines or the string to extract will be stored on more than 1 line between reportBody tag. so it can be something like that or even different:
<?xml version="1.0" ?><SOAP- ENV:Envelope xmlns:SOAP-ENV="blablah"><SOAP-ENV:Body>
`enter code here`<getReportResponsexmlns:msgns="blahblahblah" xmlns="blahblah">
<returnxmlns=""> <returnCode>
<majorReturnCode>000</majorReturnCode><minorReturnCode>0000</minorReturnCode>
<returnCode>
<reportName>blahblah</reportName><reportTitle>blahblahblahr</reportTitle><reportBody>
STRING
TO
EXTRACT</reportBody>
<reportMimeType>text/csv</reportMimeType></return>
</getReportResponse></SOAP-ENV:Body></SOAP-ENV:Envelope>
What is the solution to deal with all the possible changes? Also, can I set parameters to save files and decode string to base64? Thanks !
Upvotes: 1
Views: 530
Reputation: 785128
You can use this gnu-awk to extract it:
awk -v RS='<reportBody>.*</reportBody>' 'RT{print RT}' file.xml
<reportBody>
STRING
TO
EXTRACT</reportBody>
With first input you will get this output:
<reportBody>STRING TO EXTRACT</reportBody>
-v RS='<reportBody>.*</reportBody>'
will set input record separator as any text from <reportBody>
to </reportBody>
Use:
awk -v RS='<reportBody>.*</reportBody>' 'RT{
gsub(/<\/?reportBody>[[:space:]]*/, "", RT); print RT}' file.xml
If you want to extract string inside the tags.
Upvotes: 1