Reputation: 904
I'm using the beautiful soup module to scrape the title of a list of web pages saved in a csv. The script appears to work fine, but once it reaches the 82nd domain it produces the following error:
Traceback (most recent call last):
File "soup.py", line 31, in <module>
print soup.title.renderContents() # 'Google'
AttributeError: 'NoneType' object has no attribute 'renderContents'
I'm fairly new to python so I'm not sure I understand the error, would anyone be able to clarify what's going wrong?
my code is:
import csv
import socket
from urllib2 import Request, urlopen, URLError, HTTPError
from BeautifulSoup import BeautifulSoup
debuglevel = 0
timeout = 5
socket.setdefaulttimeout(timeout)
domains = csv.reader(open('domainlist.csv'))
f = open ('souput.txt', 'w')
for row in domains:
domain = row[0]
req = Request(domain)
try:
html = urlopen(req).read()
print domain
except HTTPError, e:
print 'The server couldn\'t fulfill the request.'
print 'Error code: ', e.code
except URLError, e:
print 'We failed to reach a server.'
print 'Reason: ', e.reason
else:
# everything is fine
soup = BeautifulSoup(html)
print soup.title # '<title>Google</title>'
print soup.title.renderContents() # 'Google'
f.writelines(domain)
f.writelines(" ")
f.writelines(soup.title.renderContents())
f.writelines("\n")
Upvotes: 0
Views: 4167
Reputation: 1386
I was facing the same problem but reading a couple of related questions and googling helped me through. Here is what i would suggest to handle specific errors such as NoneType:
soup = BeautifulSoup(urllib2.urlopen('http://webpage.com').read())
scrapped = soup.find(id='whatweseekfor')
if scrapped == None:
# command when encountering an error eg: print none
elif scrapped != None:
# command when there is no None type error eg: print scrapped.get_text()
Good luck!
Upvotes: 0
Reputation: 8008
As maozet said, your problem is that title is None, you can check for that value to avoid the issue like this:
soup = BeautifulSoup(html)
if soup.title != None:
print soup.title # '<title>Google</title>'
print soup.title.renderContents() # 'Google'
f.writelines(domain)
f.writelines(" ")
f.writelines(soup.title.renderContents())
f.writelines("\n")
Upvotes: 1
Reputation: 96
What if a page doesn't have a title???
I had this problem once....just put the code in try except or check for a title.
Upvotes: 1