Reputation: 1
I have the following xml format.
I can get the title and link easily enough, but am getting stuck on how I can get the img src out of the content.
<feed>
<entry>
<title>A title</title>
<link rel="alternate" type="text/html" href="https://url.html"/>
<content> <figure> <img alt="text" src="https://image.jpg" /> <p>some text</p></content>
</entry>
</feed>
for entry in root.findall('{http://www.w3.org/2005/Atom}entry'):
title = entry.find('{http://www.w3.org/2005/Atom}title').text
link = entry.find('{http://www.w3.org/2005/Atom}link').attrib['href']
img_url = entry.find('{http://www.w3.org/2005/Atom}src')
arr.append([title, link, img_url])
print(title, link)
return render(request, 'madxmlparser.html', {'arr':arr})
Upvotes: 0
Views: 144
Reputation: 3417
If your xml is well formed you can search the attribute value in different ways, like:
import xml.etree.ElementTree as ET
from io import StringIO
xmlstr ="""<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns="http://www.w3.org/2005/Atom">
<entry>
<title>A title</title>
<link rel="alternate" type="text/html" href="https://url.html"/>
<content>
<figure>
<img alt="text" src="https://image.jpg" />
</figure>
<p>some text</p>
</content>
</entry>
</feed>"""
file = StringIO(xmlstr)
#ET.register_namespace('',"http://www.w3.org/2005/Atom")
tree = ET.parse(file)
root = tree.getroot()
#option1
for elem in root.iter():
if elem.tag == '{http://www.w3.org/2005/Atom}img':
print(elem.attrib['src'])
print(elem.get('src'))
#option2
ns = {'':"http://www.w3.org/2005/Atom"}
attr = root.find(".//img", ns).attrib['src']
print(attr)
Output:
https://image.jpg
https://image.jpg
https://image.jpg
Upvotes: 0
Reputation: 23815
but am getting stuck on how I can get the img src out of the content.
See below
import xml.etree.ElementTree as ET
xml = '''<feed>
<entry>
<title>A title</title>
<link rel="alternate" type="text/html" href="https://url.html"/>
<content> <figure> <img alt="text" src="https://image.jpg" /> <p>some text</p></figure></content>
</entry>
</feed>'''
root = ET.fromstring(xml)
print(root.find('.//img').attrib['src'])
output
https://image.jpg
Upvotes: 0