How to parse XML of nested tags in Python

Question

I have following XML.


     
        active 
        
             
                
                    Understanding Wooden Chair
                    http://abcd.xyz.com/1111?view=app
                 
                
                    How To Assemble Wooden CHair
                    http://abcd.xyz.com/2222?view=app
                 
                
                    Wooden Chair Tutorial
                    /
                 
                
                    How To Access Wood
                    /
                 
            
        
     
     
        active 
        
             
                
                    Understanding Tables
                    http://abcd.xyz.com/3333?view=app
                 
                
                    Set-up Table
                    http://abcd.xyz.com/4444?view=app
                 
                
                    How To Change table
                    http://abcd.xyz.com/5555?view=app

I am trying to parse this xml in python and creating an URL array which will contain: 1. All the http urls present in the xml 2. For the link tab if youtube is present then capture that and prepare youtube url and add it to URL array.

I have following code, but it is not giving me url and links.

from xml.etree import ElementTree

with open('faq.xml', 'rt') as f:
    tree = ElementTree.parse(f)

for node in tree.iter():
    print node.tag, node.attrib.get('url')

for node in tree.iter('outline'):
    name = node.attrib.get('link')
    url = node.attrib.get('url')
    if name and url:
        print '  %s :: %s' % (name, url)
    else:
        print name

How can I achieve this to get all urls?

developed the following code based on below answers: Problem with following is, it is printing just 1 url not all.

from xml.etree import ElementTree

def fetch_faq_urls():
    url_list = []
    with open('faq.xml', 'rt') as f:
        tree = ElementTree.parse(f)

    for link in tree.iter('link'):
        youtube = link.get('youtubeId')
        if youtube:
            print "https://www.youtube.com/watch?v=" + youtube
            video_url = "https://www.youtube.com/watch?v=" + youtube
            url_list.append(video_url)
            # print "youtubeId", link.find('label').text, '???'
        else:
            print link.find('url').text
            article_url = link.find('url').text
            url_list.append(article_url)
            # print 'url', link.find('label').text, 
      return url_list

faqs = fetch_faq_urls()
print faqs

tdelaney · Accepted Answer

The information you want is under so just iterate through those. Use get() to get the youtube id and find() to get the child object.

from xml.etree import ElementTree

with open('faq.xml', 'rt') as f:
    tree = ElementTree.parse(f)

for link in tree.iter('link'):
    youtube = link.get('youtubeId')
    if youtube:
        print "youtube", link.find('label').text, '???'
    else:
        print 'url', link.find('label').text, link.find('url').text

How to parse XML of nested tags in Python

Answers (2)

Related Questions