Reputation: 107
I'm trying to figure out how to use lxml to parse the xml from a url to return the value of the title attribute. Does anyone know what I have wrong or what would return the Title value/text? So in the example below I want to return the value of 'Weeds - S05E05 - Van Nuys - HD TV'
XML from URL:
<?xml version="1.0" encoding="UTF-8"?>
<subsonic-response xmlns="http://subsonic.org/restapi" status="ok" version="1.8.0">
<song id="11345" parent="11287" title="Weeds - S05E05 - Van Nuys - HD TV" album="Season 5" artist="Weeds" isDir="false" created="2009-07-06T22:21:16" duration="1638" bitRate="384" size="782304110" suffix="mkv" contentType="video/x-matroska" isVideo="true" path="Weeds/Season 5/Weeds - S05E05 - Van Nuys - HD TV.mkv" transcodedSuffix="flv" transcodedContentType="video/x-flv"/>
</subsonic-response>
My current Python code:
import lxml
from lxml import html
from urllib2 import urlopen
url = 'https://myurl.com'
tree = html.parse(urlopen(url))
songs = tree.findall('{*}song')
for song in songs:
print song.attrib['title']
With the above code I get no data return, any ideas?
print out of tree =
<lxml.etree._ElementTree object at 0x0000000003348F48>
print out of songs =
[]
Upvotes: 2
Views: 1175
Reputation: 107
Thanks for the help guys, I used a combination of both of yours to get it working.
import xml.etree.ElementTree as ET
from urllib2 import urlopen
url = 'https://myurl.com'
root = ET.parse(urlopen(url)).getroot()
for song in root:
print song.attrib['title']
Upvotes: 0
Reputation: 1124538
First of all, you are not actually using lxml
in your code. You import the lxml
HTML parser, but otherwise ignore it and just use the standard library xml.etree.ElementTree
module instead.
Secondly, you search for data/song
but you do not have any data
elements in your document, so no matches will be found. And last, but not least, you have a document there that uses namespaces. You'll have to include those when searching for elements, or use a {*}
wildcard search.
The following finds songs for you:
from lxml import etree
tree = etree.parse(URL) # lxml can load URLs for you
songs = tree.findall('{*}song')
for song in songs:
print song.attrib['title']
To use an explicit namespace, you'd have to replace the {*}
wildcard with the full namespace URL; the default namespace is available in the .nsmap
namespace dict on the tree
object:
namespace = tree.nsmap[None]
songs = tree.findall('{%s}song' % namespace)
Upvotes: 3
Reputation: 26848
The whole issue is with the fact that the subsonic-response
tag has a xmlns
attribute indicating that there is an xml namespace in effect. The below code takes that into account and correctly pigs up the song tags.
import xml.etree.ElementTree as ET
root = ET.parse('test.xml').getroot()
print root.findall('{http://subsonic.org/restapi}song')
Upvotes: 0