Reputation: 824
I have some XML (say, file minimal.xml) that contains error and warning messages in the following format:
<?xml version="1.0" encoding="UTF-8"?>
<messages>
<message subMessage="RSC-004">RSC-004, ERROR, [File 'OEBPS/Text/pdfMigration.html' could not be decrypted.], epub20_encryption_binary_content.epub</message>
<message subMessage="RSC-012">RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (24-67)</message>
<message subMessage="RSC-012">RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (30-82)</message>
<message subMessage="RSC-012">RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (36-81)</message>
<message subMessage="RSC-012">RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (42-75)</message>
<message subMessage="RSC-012">RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (48-61)</message>
<message subMessage="HTM-023">HTM-023, WARN, [An invalid XHTML Named Entity was found: '&0;'.], OEBPS/Text/pdfMigration.html (18-199)</message>
<message subMessage="HTM-023">HTM-023, WARN, [An invalid XHTML Named Entity was found: '&l0xb'.], OEBPS/Text/pdfMigration.html (291-6)</message>
</messages>
I'm looking for a way to extract the subMessage attribute value for all message elements that represent an ERROR (which can be identified from the presence of ERROR in the message element's text value). I'm using xmlstarlet. After some searching I found this somewhat similar case, so I adapted that as follows:
xmlstarlet sel -t -v '/messages[contains(message,"ERROR")]/message/@subMessage' minimal.xml
Result:
RSC-004
RSC-012
RSC-012
RSC-012
RSC-012
RSC-012
HTM-023
HTM-023
This is not what I expected, since these are the subMessage values of all message elements! As a further test I modified the query to extract only warnings:
xmlstarlet sel -t -v '/messages[contains(message,"WARN")]/message/@subMessage' minimal.xml
In this case the result is empty! I'm fairly new to xmlstarlet and I suspect I'm overlooking something obvious here. Any help greatly appreciated!
BTW some info on the xmlstarlet version I'm using:
compiled against libxml2 2.9.2, linked with 20903 compiled against libxslt 1.1.28, linked with 10128
Upvotes: 2
Views: 240
Reputation: 52858
You need to move the predicate to the message
, like this:
xmlstarlet sel -t -v "/messages/message[contains(.,'WARN')]/@subMessage" minimal.xml
Upvotes: 2
Reputation: 29022
Try this
xmlstarlet sel -t -v '/messages/message[contains(.,"ERROR")]/@subMessage' minimal.xml
With /messages[contains(message,"WARN")]
you erroneously tried to check the content of the messages
element and not of each message
element.
Upvotes: 2