Reputation: 73
First off what I am trying to do is ask the user for a search term. The program then searches yahoo and prints out the link of the first result. Here's the code I have so far.
from urllib import urlopen
import re, time
from BeautifulSoup import BeautifulSoup
print "What Would You Like to Search For?"
user_input = raw_input('') #Gets Search Term from User
search = "http://search.yahoo.com/search;_ylt=A2KLtaJX_1BQfT4AwX2bvZx4?p=baker&toggle=1&cop=mss&ei=UTF-8&fr=yfp-t-701"
new_search = search.replace('baker', user_input)
content = urlopen( new_search ).read()
soupcontent = BeautifulSoup(content)
link1 = soupcontent.find(id="link-1")
print link1
Everything works fine. It takes the user input and searches Yahoo. The problem I'm having is lets say I searched for 'dog'
the program would then print something like this: "a id="link-1" class="yschttl spt" href="http://www.dog.com/" data-bk="5101.1>b>Dog/b> Supplies | b>Dog/b> Food, b>Dog/b> Beds, b>Dog/b> wbr>/wbr>Flea Control & More .../a>"
Which Is indeed the first Link on the page. However I would only like it to print out "http://www.dog.com/" Can anyone help me with this?
Thanks.
Upvotes: 1
Views: 787
Reputation: 11322
Try using a regular expression. See: http://docs.python.org/library/re.html.
match = re.search(r'href="(http://.*?)"', str(link1))
print match.group(1)
Upvotes: 1
Reputation: 353209
BeautifulSoup actually makes this very easy:
>>> from bs4 import BeautifulSoup
>>> from urllib2 import urlopen
>>>
>>> url = 'http://search.yahoo.com/search?p=dog'
>>> content = urlopen(url).read()
>>> soup = BeautifulSoup(content)
>>>
>>> soup.find(id="link-1")
<a class="yschttl spt" data-bk="5097.1" href="http://www.dog.com/" id="link-1"><b>Dog</b> Supplies | <b>Dog</b> Food, <b>Dog</b> Beds, <b>Dog</b> <wbr></wbr>Flea Control & More ...</a>
>>> soup.find(id="link-1").get("href")
'http://www.dog.com/'
With your request for UTF-8 you'll probably see
u'http://www.dog.com/'
instead, the Unicode version, which is fine too.
Standard warning: be sure to check that Yahoo!'s end-user license permits whatever you want to do, because many licenses rule out certain automated uses.
Upvotes: 1