Reputation: 5132
I am trying to parse the xml from YouTube that is embedded in the code below. I am trying to display all of the titles. However, I am running into trouble when I try to print the 'title' only enter lines appear. Any advice?
#import library to do http requests:
import urllib2
#import easy to use xml parser called minidom:
from xml.dom.minidom import parseString
#all these imports are standard on most modern python implementations
#download the file:
file = urllib2.urlopen('http://gdata.youtube.com/feeds/api/users/buzzfeed/uploads?v=2&max-results=50')
#convert to string:
data = file.read()
#close file because we dont need it anymore:
file.close()
#parse the xml you downloaded
dom = parseString(data)
entry=dom.getElementsByTagName('entry')
for node in entry:
video_title=node.getAttribute('title')
print video_title
Upvotes: 1
Views: 1321
Reputation: 1
There a small bug in your code. You access title as an attribute, although it's a child element of entry. Your code can be fixed by:
dom = parseString(data)
for node in dom.getElementsByTagName('entry'):
print node.getElementsByTagName('title')[0].firstChild.data
Upvotes: 0
Reputation: 1791
lxml can be a bit difficult to figure out, so here's a really simple beautiful soup solution (It's called beautifulsoup for a reason). You can also set up beautiful soup to use the lxml parser, so the speed is about the same.
from bs4 import BeautifulSoup
soup = BeautifulSoup(data) # data as is seen in your code
soup.findAll('title')
returns a list of title
elements. you can also use soup.findAll('media:title')
in this case to return just the media:title
elements (the actual video names).
Upvotes: 0
Reputation: 661
Title is not an attribute, it is a child element of an entry.
here is an example how to extract it:
for node in entry:
video_title = node.getElementsByTagName('title')[0].firstChild.nodeValue
print video_title
Upvotes: 1