nutt318
nutt318

Reputation: 107

Python - use lxml to return value of title.text attrib

I'm trying to figure out how to use lxml to parse the xml from a url to return the value of the title attribute. Does anyone know what I have wrong or what would return the Title value/text? So in the example below I want to return the value of 'Weeds - S05E05 - Van Nuys - HD TV'

XML from URL:

<?xml version="1.0" encoding="UTF-8"?>
<subsonic-response xmlns="http://subsonic.org/restapi" status="ok" version="1.8.0">
<song id="11345" parent="11287" title="Weeds - S05E05 - Van Nuys - HD TV" album="Season 5" artist="Weeds" isDir="false" created="2009-07-06T22:21:16" duration="1638" bitRate="384" size="782304110" suffix="mkv" contentType="video/x-matroska" isVideo="true" path="Weeds/Season 5/Weeds - S05E05 - Van Nuys - HD TV.mkv" transcodedSuffix="flv" transcodedContentType="video/x-flv"/>
</subsonic-response>

My current Python code:

import lxml
from lxml import html
from urllib2 import urlopen

url = 'https://myurl.com'

tree = html.parse(urlopen(url))
songs = tree.findall('{*}song')
for song in songs:
    print song.attrib['title']

With the above code I get no data return, any ideas?

print out of tree =

<lxml.etree._ElementTree object at 0x0000000003348F48>

print out of songs =

[]

Upvotes: 2

Views: 1175

Answers (3)

nutt318
nutt318

Reputation: 107

Thanks for the help guys, I used a combination of both of yours to get it working.

import xml.etree.ElementTree as ET
from urllib2 import urlopen

url = 'https://myurl.com'
root = ET.parse(urlopen(url)).getroot()
for song in root:
    print song.attrib['title']

Upvotes: 0

Martijn Pieters
Martijn Pieters

Reputation: 1124538

First of all, you are not actually using lxml in your code. You import the lxml HTML parser, but otherwise ignore it and just use the standard library xml.etree.ElementTree module instead.

Secondly, you search for data/song but you do not have any data elements in your document, so no matches will be found. And last, but not least, you have a document there that uses namespaces. You'll have to include those when searching for elements, or use a {*} wildcard search.

The following finds songs for you:

from lxml import etree

tree = etree.parse(URL)  # lxml can load URLs for you
songs = tree.findall('{*}song')
for song in songs:
    print song.attrib['title']

To use an explicit namespace, you'd have to replace the {*} wildcard with the full namespace URL; the default namespace is available in the .nsmap namespace dict on the tree object:

namespace = tree.nsmap[None]
songs = tree.findall('{%s}song' % namespace)

Upvotes: 3

Marwan Alsabbagh
Marwan Alsabbagh

Reputation: 26848

The whole issue is with the fact that the subsonic-response tag has a xmlns attribute indicating that there is an xml namespace in effect. The below code takes that into account and correctly pigs up the song tags.

import xml.etree.ElementTree as ET
root = ET.parse('test.xml').getroot()
print root.findall('{http://subsonic.org/restapi}song')

Upvotes: 0

Related Questions