Vor
Vor

Reputation: 35149

What does this error in beautiful soup means?

I'm doing little script using PyQt4 and BeautifulSoup. Basically you specify url and than script supposed to download all pic's from web-page.

In the output, when I provide http://yahoo.com it downloads all the pictures except one:

...
Download Complete
Download Complete
File name is wrong 
Traceback (most recent call last):
  File "./picture_downloader.py", line 41, in loadComplete
    self.download_image()
  File "./picture_downloader.py", line 58, in download_image
    print 'File name is wrong ',image['src']
  File "/usr/local/lib/python2.7/dist-packages/beautifulsoup4-4.1.3-py2.7.egg/bs4/element.py", line 879, in __getitem__
    return self.attrs[key]
KeyError: 'src'

output from http://stackoverflow.com is:

Download Complete
File name is wrong  h
Download Complete

And finally , here is part of the code:

# SLOT for loadFinished
def loadComplete(self): 
    self.download_image()

def download_image(self):
    html = unicode(self.frame.toHtml()).encode('utf-8')
    soup = bs(html)

    for image in soup.findAll('img'):
        try:
            file_name = image['src'].split('/')[-1]
            cur_path = os.path.abspath(os.curdir)
            if not os.path.exists(os.path.join(cur_path, 'images/')):
                os.makedirs(os.path.join(cur_path, 'images/'))
            f_path = os.path.join(cur_path, 'images/%s' % file_name)
            urlretrieve(image['src'], f_path)
            print "Download Complete"
        except:
            print 'File name is wrong ',image['src']
    print "No more pictures on the page"

Upvotes: 2

Views: 13104

Answers (2)

root
root

Reputation: 80366

This means that the image element doesn't have a "src" attribute, and you get the same error twice: once in file_name = image['src'].split('/')[-1] and after that in the except block 'File name is wrong ',image['src'].


The simplest way to avoid the problem would be to replace soup.findAll('img') with soup.findAll('img',{"src":True}) so it would only find the elements that have a src attribute.


If there are two possibilities, try something like:

for image in soup.findAll('img'):
    v = image.get('src', image.get('dfr-src'))  # get's "src", else "dfr_src"
                                                # if both are missing - None
    if v is None:
        continue  # continue loop with the next image
    # do your stuff

Upvotes: 7

That1Guy
That1Guy

Reputation: 7233

Ok, so this is what's going on. Within your try-except, you're getting a KeyError from file_name = image['src'].split('/')[-1] because that object does not have a src attribute.

Then, after your except statement, you're trying to access the same attribute that caused the error: print 'File name is wrong ',image['src'].

Examine the img tag causing the error and reevaluate your logic for those cases.

Upvotes: 2

Related Questions