hacktek
hacktek

Reputation: 13

Parsing an xml with xmlstarlet

I'm really new to XML parsing and need some help. Suppose an xml such as this one:

<?xml version="1.0" encoding="UTF-8"?>
<top>
  <node:config xmlns:node="uri:example.com/somepath/node/10.0" xmlns:this="uri:example.com/somepath/this/10.0" xmlns:that="uri:example.com/somepath/node/10.0" xmlns:thus="uri:example.com/somepath/thus/10.0" xmlns:an="uri:example.com/somepath/an/10.0">
    <this:is.a.test>on</this:is.a.test>
    <that:was.some.thing>off</that:was.some.thing>
    <thus:can name="species" value="value1 value2">
      <an:idea.for.something>on</an:idea.for.something>
      <an:idea.for.something.else>on</an:idea.for.something.else>
    </thus:can>
    <thus:can name="monkey" value="value3 value4">
      <an:idea.for.something>off</an:idea.for.something>
      <an:idea.for.something.else>off</an:idea.for.something.else>
    </thus:can>
  </node:config>
</top>

How would I go about printing everything inside of the "thus" namespace when name=species and value=value1 for example?

Thanks!

Upvotes: 0

Views: 553

Answers (1)

PBI
PBI

Reputation: 335

To select the whole block thus:can:

xmlstarlet sel -N thus="uri:example.com/somepath/thus/10.0" -t -c '//thus:can'

Next refine to only those having attrib name="species":

xmlstarlet sel -N thus="uri:example.com/somepath/thus/10.0" -t -c '//thus:can[@name="species"]'

Or those which somewhere have the string "value1" in the attrib value:

xmlstarlet sel -N thus="uri:example.com/somepath/thus/10.0" -t -c '//thus:can[contains(@value,"value1")]'

And the 2 restrictions combined:

xmlstarlet sel -N thus="uri:example.com/somepath/thus/10.0" -t -c '//thus:can[@name="species" and contains(@value,"value1")]'

Beware that when your value=.. attribute should have well defined internal delimiters, to avoid matching an unwanted substring:

... value="apple grapefruit" ...
... value="monkey ape chimp" ...

and then searching for contains(@value,"ape") this will match both values (because grapefruit contains ape). Add some delimeters between, and also at start plus end, e.g. a colon:

.... value=":apple:grapefruit:" 

and searching with:

contains(@value,":ape:")

does not match that value, but only the real x:ape:y.

Upvotes: 1

Related Questions