metalboxhead
metalboxhead

Reputation: 1

Get image url sub-element using Python Elementree

I have the following xml format.

I can get the title and link easily enough, but am getting stuck on how I can get the img src out of the content.

<feed>
<entry>
<title>A title</title>
<link rel="alternate" type="text/html" href="https://url.html"/>
<content> <figure> <img alt="text" src="https://image.jpg" /> <p>some text</p></content>
</entry>
</feed>

    for entry in root.findall('{http://www.w3.org/2005/Atom}entry'):
        title = entry.find('{http://www.w3.org/2005/Atom}title').text
        link = entry.find('{http://www.w3.org/2005/Atom}link').attrib['href']
        img_url = entry.find('{http://www.w3.org/2005/Atom}src')
        arr.append([title, link, img_url])
        print(title, link)
    return render(request, 'madxmlparser.html', {'arr':arr})

Upvotes: 0

Views: 144

Answers (2)

Hermann12
Hermann12

Reputation: 3417

If your xml is well formed you can search the attribute value in different ways, like:

import xml.etree.ElementTree as ET
from io import StringIO

xmlstr ="""<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns="http://www.w3.org/2005/Atom">
   <entry>
      <title>A title</title>
      <link rel="alternate" type="text/html" href="https://url.html"/>
      <content> 
        <figure> 
          <img alt="text" src="https://image.jpg" /> 
        </figure>
        <p>some text</p>
      </content>
   </entry>
</feed>"""

file = StringIO(xmlstr)

#ET.register_namespace('',"http://www.w3.org/2005/Atom")

tree = ET.parse(file)
root = tree.getroot()

#option1
for elem in root.iter():
    if elem.tag == '{http://www.w3.org/2005/Atom}img':
        print(elem.attrib['src'])
        print(elem.get('src'))
        
#option2
ns = {'':"http://www.w3.org/2005/Atom"}
attr = root.find(".//img", ns).attrib['src']
print(attr)

Output:

https://image.jpg
https://image.jpg
https://image.jpg

Upvotes: 0

balderman
balderman

Reputation: 23815

but am getting stuck on how I can get the img src out of the content.

See below

import xml.etree.ElementTree as ET
xml = '''<feed>
   <entry>
      <title>A title</title>
      <link rel="alternate" type="text/html" href="https://url.html"/>
      <content> <figure> <img alt="text" src="https://image.jpg" /> <p>some text</p></figure></content>
   </entry>
</feed>'''

root = ET.fromstring(xml)
print(root.find('.//img').attrib['src'])

output

https://image.jpg

Upvotes: 0

Related Questions