Reputation: 11
First time posting here but not the first time using Stack Overflow as a resource. Must say this site has been integral to my work in general.
I have used sed
in so many ways before but can't seem to figure out how I can return the full XML node, if and only if, one of its child nodes meets certain criteria. I know how to use the 2 addresses convention (/START/END/command
) but need to restrict the result only to specific matching child nodes.
Example:
<entity id="001">
<name>Jane Doe</name>
<country>US</country>
</entity>
<entity id="002">
<name>Jose Reyes</name>
<country>Mexico</country>
</entity>
<entity id="003">
<name>Juan Dela Cruz</name>
<country>Philippines</country>
</entity>
<entity id="004">
<name>William Shatner</name>
<country>US</country>
</entity>
If I want to return the full entity node with id 003
, I can use the following command:
sed -n '/entity id="003"/,/<\/entity>/p'
However, if I want to return the full entity nodes that match the country US
, how should I go about that one?
I don't mind doing the work myself if you can point me to a general direction. In fact, I do prefer that one over spoon feeding.
Thanks!
Upvotes: 0
Views: 619
Reputation: 10865
As you may have seen in comments on similar questions, the best thing for processing XML is a tool made for processing XML, and not a general text processing tool like sed or awk.
For example if you have access to xmlstarlet
:
$ xmlstarlet sel -t -c "//entity[country = 'US']" file.xml
<entity id="001">
<name>Jane Doe</name>
<country>US</country>
</entity><entity id="004">
<name>William Shatner</name>
<country>US</country>
</entity>
Especially if you're going to be working with XML more than a little bit, I would put the effort into researching the available command line tools more suited for parsing XML.
If you're really stuck then awk
would be a better option than sed
, and awk
should be available anywhere sed is:
$ cat a.awk
/<entity id/ { f = 1; s = "" }
f { s = s ? (s ORS $0) : $0 }
/<country>US</ { f = 2 }
/<\/entity>/ {
if (f == 2) print s
f = 0
}
$ awk -f a.awk file.xml
<entity id="001">
<name>Jane Doe</name>
<country>US</country>
</entity>
<entity id="004">
<name>William Shatner</name>
<country>US</country>
</entity>
Upvotes: 1