Reputation: 8043
Writting a part of a program that searches for a phrase on YouTube then i want it to get the url for the first video. but i cant figure how to get the url for the first video
Here is my code:
import urllib2, urllib
raw_i=raw_input("Search: ")
x = urllib.quote_plus(raw_i)
site1 = urllib2.urlopen('http://www.youtube.com/results?search_query=%s'%x)
y = site1.read()
this reads the search page but i want it to return just the url for the video
For example lets use the phrase "Coconut by Harry Nilsson"
here is the HTML for the first video
<li class="yt-lockup2 clearfix yt-uix-tile result-item-padding has-hover-effects yt- lockup2-video yt-lockup2-tile context-data-item" data-context-item-title="Harry Nilsson - Coconut (1971)" data-context-item-views="2,930,881 views" data-context-item-type="video" data-context-item-id="Tbgv8PkO9eo" data-context-item-time="4:32" data-context-item- user="Zoltán Makk">
<div class="yt-lockup2-thumbnail">
<a href="/watch?v=Tbgv8PkO9eo" class="ux-thumb-wrap yt-uix-sessionlink yt-uix- contextlink contains-addto " data-sessionlink="ved=CDIQwBs&ei=prWOUZT9KIK8igLtyICAAQ"> <span class="video-thumb yt-thumb yt-thumb-185" >
<span class="yt-thumb-default">
<span class="yt-thumb-clip">
<span class="yt-thumb-clip-inner">
<img alt="Thumbnail" src="//i1.ytimg.com/vi/Tbgv8PkO9eo/mqdefault.jpg" width="185" >
<span class="vertical-align"></span>
</span>
</span>
</span>
</span>
<span class="video-time">4:32</span>
i want just the "/watch?v=Tbgv8PkO9eo"
out of it to be returned
Thank You!
Upvotes: 0
Views: 262
Reputation: 26333
You can use HTMLParser
. Create your own parser deriving from the Python class.
from HTMLParser import HTMLParser
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
# Only parse the 'anchor' tag.
if tag == "a":
# Check the list of defined attributes.
for name, value in attrs:
# If href is defined, print it.
if name == "href":
print name, "=", value
You create a parser and feed
it with your html string.
your_html_string='<li class="yt-lockup2 clearfix yt-uix-tile result-item- \
padding has-hover-effects yt-lockup2-video yt-lockup2-tile \
context-data-item" data-context-item-title="Harry Nilsson - \
Coconut (1971)" data-context-item-views="2,930,881 views" \
data-context-item-type="video" data-context-item- \
id="Tbgv8PkO9eo" data-context-item-time="4:32" \
data-context-item-user="Zoltán Makk">\
<div class="yt-lockup2-thumbnail">\
<a href="/watch?v=Tbgv8PkO9eo" class="ux-thumb-wrap \
yt-uix-sessionlink yt-uix-contextlink contains-addto" data-\
sessionlink="ved=CDIQwBs&ei=prWOUZT9KIK8igLtyICAAQ">\
<span class="video-thumb yt-thumb yt-thumb-185" >\
<span class="yt-thumb-default"> \
<span class="yt-thumb-clip" \
<span class="yt-thumb-clip-inner"> \
<img alt="Thumbnail" \
src="//i1.ytimg.com/vi/Tbgv8PkO9eo/mqdefault.jpg" \
width="185" > <span class="vertical-align"></span> \
</span> </span></span></span> \
<span class="video-time">4:32</span>'
parser = MyHTMLParser()
parser.feed(your_html_string)
Result is
>>>
href = /watch?v=Tbgv8PkO9eo
Upvotes: 1