Reputation: 434
I am attempting to grab the link from the website. Its the sound of the word. The website is http://dictionary.reference.com/browse/would?s=t
so I am using the following code to get the link but it is coming up up blank. This is weird because I can use a similar set up and pull data from a stock. The idea is to build a program that gives the sound of the word then I will ask for the spelling. This is for my kids pretty much. I needed to go through a list of words to get the links in a dictionary but having trouble getting the link to print out. I'm using urllib and re code below.
import urllib
import re
words = [ "would","your", "apple", "orange"]
for word in words:
urll = "http://dictionary.reference.com/browse/" + word + "?s=t" #produces link
htmlfile = urllib.urlopen(urll)
htmltext = htmlfile.read()
regex = '<a class="speaker" href =>(.+?)</a>' #puts tag together
pattern = re.compile(regex)
link = re.findall(pattern, htmltext)
print "the link for the word", word, link #should print link
This is the expected output for the word would http://static.sfdict.com/staticrep/dictaudio/W02/W0245800.mp3
Upvotes: 1
Views: 637
Reputation: 474021
You should fix your regular expression to grab everything inside the href
attribute value:
<a class="speaker" href="(.*?)"
Note that you should really consider switching from regex to HTML parsers, like BeautifulSoup
.
Here is how you can apply BeautifulSoup
in this case:
import urllib
from bs4 import BeautifulSoup
words = ["would","your", "apple", "orange"]
for word in words:
urll = "http://dictionary.reference.com/browse/" + word + "?s=t" #produces link
htmlfile = urllib.urlopen(urll)
soup = BeautifulSoup(htmlfile, "html.parser")
links = [link["href"] for link in soup.select("a.speaker")]
print(word, links)
Upvotes: 2