Aman Saraf
Aman Saraf

Reputation: 607

Get Values from child nodes from XML | Python

I have the following XML.

I am using ElementTree library to scrape the values.

<?xml version="1.0" encoding="UTF-8"?>

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
 <url>    
  <loc> Test1</loc>
  </url>
 <url>
  <loc>Test 2</loc>
 </url>
 <url>
  <loc>Test 3</loc>
 </url>
</urlset>

I need to get the values out of 'loc tag'.

Desired Output:

Test 1
Test 2
Test 3

Tried Code:

tree = ET.parse('sitemap.xml')
root = tree.getroot()
for atype in root.findall('url'):
 rank = atype.find('loc').text
print (rank)

Any suggestions on where am I wrong ?

Upvotes: 0

Views: 2910

Answers (2)

zwer
zwer

Reputation: 25779

Your XML has a default namespace (http://www.sitemaps.org/schemas/sitemap/0.9) so you either have to address all your tags as:

tree = ET.parse('sitemap.xml')
root = tree.getroot()
for atype in root.findall('{http://www.sitemaps.org/schemas/sitemap/0.9}url'):
    rank = atype.find('{http://www.sitemaps.org/schemas/sitemap/0.9}loc').text
    print(rank)

Or to define a namespace map:

nsmap = {"ns": "http://www.sitemaps.org/schemas/sitemap/0.9"}

tree = ET.parse('sitemap.xml')
root = tree.getroot()
for atype in root.findall('ns:url', nsmap):
    rank = atype.find('ns:loc', nsmap).text
    print(rank)

Upvotes: 2

Attila Kis
Attila Kis

Reputation: 533

from lxml import etree


tree = etree.parse('sitemap.xml')
    for element in tree.iter('*'):
        if element.text.find('Test') != -1:
            print element.text

Probably isn't the most beautiful solution, but it works :)

Upvotes: 0

Related Questions