johan
johan

Reputation: 824

Conditional extraction of XML attributes with xmlstarlet

I have some XML (say, file minimal.xml) that contains error and warning messages in the following format:

<?xml version="1.0" encoding="UTF-8"?>
  <messages>
     <message subMessage="RSC-004">RSC-004, ERROR, [File 'OEBPS/Text/pdfMigration.html' could not be decrypted.], epub20_encryption_binary_content.epub</message>
     <message subMessage="RSC-012">RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (24-67)</message>
     <message subMessage="RSC-012">RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (30-82)</message>
     <message subMessage="RSC-012">RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (36-81)</message>
     <message subMessage="RSC-012">RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (42-75)</message>
     <message subMessage="RSC-012">RSC-012, ERROR, [Fragment identifier is not defined.], OEBPS/toc.ncx (48-61)</message>
     <message subMessage="HTM-023">HTM-023, WARN, [An invalid XHTML Named Entity was found: '&amp;0;'.], OEBPS/Text/pdfMigration.html (18-199)</message>
     <message subMessage="HTM-023">HTM-023, WARN, [An invalid XHTML Named Entity was found: '&amp;l0xb'.], OEBPS/Text/pdfMigration.html (291-6)</message>
  </messages>

I'm looking for a way to extract the subMessage attribute value for all message elements that represent an ERROR (which can be identified from the presence of ERROR in the message element's text value). I'm using xmlstarlet. After some searching I found this somewhat similar case, so I adapted that as follows:

xmlstarlet sel -t -v '/messages[contains(message,"ERROR")]/message/@subMessage' minimal.xml

Result:

RSC-004
RSC-012
RSC-012
RSC-012
RSC-012
RSC-012
HTM-023
HTM-023

This is not what I expected, since these are the subMessage values of all message elements! As a further test I modified the query to extract only warnings:

xmlstarlet sel -t -v '/messages[contains(message,"WARN")]/message/@subMessage' minimal.xml

In this case the result is empty! I'm fairly new to xmlstarlet and I suspect I'm overlooking something obvious here. Any help greatly appreciated!

BTW some info on the xmlstarlet version I'm using:

compiled against libxml2 2.9.2, linked with 20903 compiled against libxslt 1.1.28, linked with 10128

Upvotes: 2

Views: 240

Answers (2)

Daniel Haley
Daniel Haley

Reputation: 52858

You need to move the predicate to the message, like this:

xmlstarlet sel -t -v "/messages/message[contains(.,'WARN')]/@subMessage" minimal.xml

Upvotes: 2

zx485
zx485

Reputation: 29022

Try this

xmlstarlet sel -t -v '/messages/message[contains(.,"ERROR")]/@subMessage' minimal.xml

With /messages[contains(message,"WARN")] you erroneously tried to check the content of the messages element and not of each message element.

Upvotes: 2

Related Questions