Print XML element with AWK

Question

How do I print the contents of an XML element - from the starting tag to the closing tag - using AWK?

For example, consider the following XML:


    Delta
    22
    Atlanta
    Paris
    5:40pm
    8:10am

 
       Athens 
       GA
        Home of the University of Georgia
       100,000
       Located about 60 miles Northeast of Atlanta
       33 57' 39" N
       83 22' 42" W

The desired output could be contents of the city element, from to .

Mark O&#39;Connor · Accepted Answer

Solutions that parse XML with tools like awk and sed are imperfect. You cannot rely on XML always having a human readable layout. For example some web services will omit new-lines, resulting in the entire XML document appearing on one line.

I would recommend using xmllint, which has the ability to select nodes using XPATH, a query language designed for XML.

The following command will select the city tags:

xmllint --xpath "//city" data.xml

XPath is extremely useful. It makes the every part of the XML document addressable:

xmllint --xpath "string(//city[1]/@id)" data.xml

Returns the string "AT".

Poorly formatted XML data

This time return the first occurrence of the "city" tag. xmllint can also be used to pretty print the result:

$ xmllint --xpath "//city[1]" data.xml  | xmllint -format -


  Athens
  GA
   Home of the University of Georgia
  100,000
  Located about 60 miles Northeast of Atlanta
  33 57' 39" N
  83 22' 42" W

data.xml

In this same data the first "city" tag appears all on one line. This is valid XML.


  
    Delta
    22
    Atlanta
    Paris
    5:40pm
    8:10am
  
   Athens GA  Home of the University of Georgia 100,000 Located about 60 miles Northeast of Atlanta 33 57' 39" N 83 22' 42" W 
  
    Dublin
    Dub
     Dublin
    1,500,000
    Ireland
    NA
    NA

Print XML element with AWK

Answers (2)

Poorly formatted XML data

data.xml

Related Questions