Reputation: 35149
I'm doing little script using PyQt4 and BeautifulSoup. Basically you specify url and than script supposed to download all pic's from web-page.
In the output, when I provide http://yahoo.com it downloads all the pictures except one:
...
Download Complete
Download Complete
File name is wrong
Traceback (most recent call last):
File "./picture_downloader.py", line 41, in loadComplete
self.download_image()
File "./picture_downloader.py", line 58, in download_image
print 'File name is wrong ',image['src']
File "/usr/local/lib/python2.7/dist-packages/beautifulsoup4-4.1.3-py2.7.egg/bs4/element.py", line 879, in __getitem__
return self.attrs[key]
KeyError: 'src'
output from http://stackoverflow.com is:
Download Complete
File name is wrong h
Download Complete
And finally , here is part of the code:
# SLOT for loadFinished
def loadComplete(self):
self.download_image()
def download_image(self):
html = unicode(self.frame.toHtml()).encode('utf-8')
soup = bs(html)
for image in soup.findAll('img'):
try:
file_name = image['src'].split('/')[-1]
cur_path = os.path.abspath(os.curdir)
if not os.path.exists(os.path.join(cur_path, 'images/')):
os.makedirs(os.path.join(cur_path, 'images/'))
f_path = os.path.join(cur_path, 'images/%s' % file_name)
urlretrieve(image['src'], f_path)
print "Download Complete"
except:
print 'File name is wrong ',image['src']
print "No more pictures on the page"
Upvotes: 2
Views: 13104
Reputation: 80366
This means that the image
element doesn't have a "src"
attribute, and you get the same error twice: once in file_name = image['src'].split('/')[-1]
and after that in the except block 'File name is wrong ',image['src']
.
The simplest way to avoid the problem would be to replace soup.findAll('img')
with soup.findAll('img',{"src":True})
so it would only find the elements that have a src
attribute.
If there are two possibilities, try something like:
for image in soup.findAll('img'):
v = image.get('src', image.get('dfr-src')) # get's "src", else "dfr_src"
# if both are missing - None
if v is None:
continue # continue loop with the next image
# do your stuff
Upvotes: 7
Reputation: 7233
Ok, so this is what's going on. Within your try-except, you're getting a KeyError
from file_name = image['src'].split('/')[-1]
because that object does not have a src
attribute.
Then, after your except
statement, you're trying to access the same attribute that caused the error: print 'File name is wrong ',image['src']
.
Examine the img
tag causing the error and reevaluate your logic for those cases.
Upvotes: 2