Reputation: 609
Ok, I'm trying to use opener with beautiful soup to extract some info from a page, and I think that's where the problem is arising. I need to use opener because I need to route it through Tor, as I think they have blocked multiple requests.
(If this is all unformatted I'll edit straight away, as usually something weird happens.)
Here's the code:
def getsite():
proxy = urllib2.ProxyHandler({"http" : "127.0.0.1:8118"})
opener = urllib2.build_opener(proxy)
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
url = opener.open('https://www.website.com')
try:
page = BeautifulSoup(urllib2.urlopen(url).read())
except Exception as Err:
errorlist.append('Unexpected Error ' + str(Err))
time.sleep(60)
page = BeautifulSoup(urllib2.urlopen(url).read())
values = page.findAll("strong")
high = values[2]
low = values[1]
last = values[0]
vol = values[3]
high = str(high)
low = str(low)
last = str(last)
vol = str(vol)
high = high[8:-13]
low = low[8:-13]
last = last[8:-13]
vol = vol[8:-24]
print high, low, last, vol
while True:
getsite()
time.sleep(3200)
And it throws up this error.
page = BeautifulSoup(urllib2.urlopen(url).read()) File "C:\Python27\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout) File "C:\Python27\lib\urllib2.py", line 392, in open
protocol = req.get_type() AttributeError: addinfourl instance has no attribute 'get_type'
Upvotes: 3
Views: 12577
Reputation: 142206
Looks like you you're using the opener object as though it was a URL:
page = BeautifulSoup(urllib2.urlopen(url).read())
Where url
is the opened opener... instead, do:
page = BeautifulSoup(url.read())
Upvotes: 7