Saeid Hedayati
Saeid Hedayati

Reputation: 83

how to get speceific elements and sub elements in lxml recursively?

I have this xml file which is like this (of course its a small part of xml file) and the article id

<article-set xmlns:ns0="http://casfwcewf.xsd" format-version="5">
<article>
 <article id="11234">
     <source>
     <hostname>some hostname for 11234</hostname>
     </source>
     <feed>
         <type>RSS</type>
     </feed>
     <uri>some uri for 11234</uri>
 </article>
 <article id="63563">
     <source>
     <hostname>some hostname for 63563 </hostname>
     </source>
     <feed>
         <type>RSS</type>
     </feed>
     <uri>some uri  for 63563</uri>
  </article>
.
.
.
</article></article-set>

what i want, is to print each article id with its specific hostname and uri for the whole document (like this).

id=11234 
uri= some uri for 11234
source=some hostname for 11234

id=63563 
uri= some uri for 63563
source=some hostname for 63563
.
.
.

I used this code to do so,

from lxml import etree
tree = etree.parse("C:\\Users\\me\\Desktop\\public.xml")

for article in tree.iter('article'):

    article_id=article.attrib.get('id')
    uri= tree.xpath("//article[@id]/uri/text()")
    source= tree.xpath("//article[@id]/source/hostname/text()")

    #i even used these two codes
    #source=article.attrib.get('hostname')
    #source = etree.SubElement(article, "hostname")



   print('id={!s}'.format(article_id),"\n")
   print('uri={!s}'.format(uri),"\n")
   print('source={!s}'.format(source),"\n")

and it did not work, could someone help me with this?

Upvotes: 1

Views: 502

Answers (1)

Bill Bell
Bill Bell

Reputation: 21663

There might very well be some much more clever way of writin this; however, this does appear to work.

>>> for article in tree.iter('article'):
...     article_id = article.attrib.get('id')
...     uri = tree.xpath("//article[@id={}]/uri/text()".format(article_id))
...     source = tree.xpath("//article[@id={}]/source/hostname/text()".format(article_id))
...     article_id, uri, source
...     
('11234', ['some uri for 11234'], ['some hostname for 11234'])
('63563', ['some uri  for 63563'], ['some hostname for 63563 '])

Incidentally I changed the xml so that the element just inside the container element is <articles> (rather than <article>). Like this:

<article-set xmlns:ns0="http://casfwcewf.xsd" format-version="5">
<articles>
 <article id="11234">
     <source>
...

Upvotes: 1

Related Questions