sharataka
sharataka

Reputation: 5132

How to parse YouTube XML using Python?

I am trying to parse the xml from YouTube that is embedded in the code below. I am trying to display all of the titles. However, I am running into trouble when I try to print the 'title' only enter lines appear. Any advice?

#import library to do http requests:
import urllib2

#import easy to use xml parser called minidom:
from xml.dom.minidom import parseString
#all these imports are standard on most modern python implementations

#download the file:
file = urllib2.urlopen('http://gdata.youtube.com/feeds/api/users/buzzfeed/uploads?v=2&max-results=50')
#convert to string:
data = file.read()
#close file because we dont need it anymore:
file.close()

#parse the xml you downloaded
dom = parseString(data)
entry=dom.getElementsByTagName('entry')
for node in entry:
    video_title=node.getAttribute('title')
    print video_title

Upvotes: 1

Views: 1321

Answers (3)

Skovhus
Skovhus

Reputation: 1

There a small bug in your code. You access title as an attribute, although it's a child element of entry. Your code can be fixed by:

dom = parseString(data)
for node in dom.getElementsByTagName('entry'):
    print node.getElementsByTagName('title')[0].firstChild.data

Upvotes: 0

kreativitea
kreativitea

Reputation: 1791

lxml can be a bit difficult to figure out, so here's a really simple beautiful soup solution (It's called beautifulsoup for a reason). You can also set up beautiful soup to use the lxml parser, so the speed is about the same.

from bs4 import BeautifulSoup
soup = BeautifulSoup(data) # data as is seen in your code
soup.findAll('title')

returns a list of title elements. you can also use soup.findAll('media:title') in this case to return just the media:title elements (the actual video names).

Upvotes: 0

asciimoo
asciimoo

Reputation: 661

Title is not an attribute, it is a child element of an entry.

here is an example how to extract it:

for node in entry:
    video_title = node.getElementsByTagName('title')[0].firstChild.nodeValue
    print video_title

Upvotes: 1

Related Questions