Adrian
Adrian

Reputation: 2656

Get URL from XML node with xmllint, add new line

I am extracting URL's from a XML file with this command:

xmllint --xpath '//ROOT/ITEM/PHOTO/text()' xml_2015-05-13-20\:39.xml

It works, but output is a mass text of URL's:

http://1.jpghttp://2.jpghttp://3.jpghttp://4.jpghttp://5.jpghttp://6.jpg

It is possible to add \n new line character after each match?

XML:

<ROOT>
   <ITEM>
      <PHOTO>http://1.jpg</PHOTO>
   </ITEM>
   <ITEM>
      <PHOTO>http://2.jpg</PHOTO>
   </ITEM>
</ROOT>

Upvotes: 2

Views: 2244

Answers (3)

shellter
shellter

Reputation: 37318

Get XMLStarlet and try

 xmlstarlet sel -t -m "/ROOT/ITEM/PHOTO" -v . -n xml_2015-05-13-20\:39.xml 
            |   |  |                     |    |
            |   |  |                     |    -n ... add new-line after printed element
            |   |  |                     -v .  print the value of the matched node
            |   |  -m match this Xpath
            |   -t  (select) using a template (the -m part)
            sel(ect) 

xmlstarlet is designed for cmd-line processing and scripting solutions, where as xmllint does not list such uses as a top priority.

Upvotes: 3

Christoph
Christoph

Reputation: 735

With xmllint itself it is not possible, as others already stated.

But with help from tools like sed you could achive what you want:

$ xmllint --xpath "//ROOT/ITEM/PHOTO" xml_2015-05-13-20\:39.xml | sed "s/<\/PHOTO>/<\/PHOTO>\n/g"
<PHOTO>http://1.jpg</PHOTO>
<PHOTO>http://2.jpg</PHOTO>

now to get rid of the tags an additional expression is required:

$ xmllint --xpath "//ROOT/ITEM/PHOTO" xml_2015-05-13-20\:39.xml | sed "s/<\/PHOTO>/<\/PHOTO>\n/g ; s/<[^>]\+>//g"
http://1.jpg
http://2.jpg

Upvotes: 1

Adrian
Adrian

Reputation: 2656

Here is a possible way to do that whit xidel:

xidel -e "//ROOT/ITEM/PHOTO/text()" -q ./my.xml > ./processed_xml

Upvotes: 4

Related Questions