Reputation: 607
I have the following XML.
I am using ElementTree library to scrape the values.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc> Test1</loc>
</url>
<url>
<loc>Test 2</loc>
</url>
<url>
<loc>Test 3</loc>
</url>
</urlset>
I need to get the values out of 'loc tag'.
Desired Output:
Test 1
Test 2
Test 3
Tried Code:
tree = ET.parse('sitemap.xml')
root = tree.getroot()
for atype in root.findall('url'):
rank = atype.find('loc').text
print (rank)
Any suggestions on where am I wrong ?
Upvotes: 0
Views: 2910
Reputation: 25779
Your XML has a default namespace (http://www.sitemaps.org/schemas/sitemap/0.9
) so you either have to address all your tags as:
tree = ET.parse('sitemap.xml')
root = tree.getroot()
for atype in root.findall('{http://www.sitemaps.org/schemas/sitemap/0.9}url'):
rank = atype.find('{http://www.sitemaps.org/schemas/sitemap/0.9}loc').text
print(rank)
Or to define a namespace map:
nsmap = {"ns": "http://www.sitemaps.org/schemas/sitemap/0.9"}
tree = ET.parse('sitemap.xml')
root = tree.getroot()
for atype in root.findall('ns:url', nsmap):
rank = atype.find('ns:loc', nsmap).text
print(rank)
Upvotes: 2
Reputation: 533
from lxml import etree
tree = etree.parse('sitemap.xml')
for element in tree.iter('*'):
if element.text.find('Test') != -1:
print element.text
Probably isn't the most beautiful solution, but it works :)
Upvotes: 0