midomid
midomid

Reputation: 49

fixing an error on a web images downloader python script

I've tried to build a function to download multiple images from this web page https://www.stocksy.com/service/contributors/ with name replacement of each name but I've got errors I tried to change 'href' to 'data-profile' here is the full code :

import requests
from bs4 import BeautifulSoup
import os

#url = 'https://www.stocksy.com/service/contributors/'
def imagedown(url, folder):
    try:
        os.mkdir(os.path.join(os.getcwd(), folder))
    except:
        pass
    os.chdir(os.path.join(os.getcwd(), folder))
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    images = soup.find_all('img')
    for image in images:
        name = image['href']
        link = image['src']
        with open(name.replace(' ', '-').replace('/', '') + '.jpg', 'wb') as f:
            im = requests.get(link)
            f.write(im.content)
            print('Writing: ', name)

imagedown('https://www.stocksy.com/service/contributors/', 'contributors')

and here is the bugs :

> KeyError                                  Traceback (most recent call
> last) Input In [4], in <cell line: 23>()
>      20             f.write(im.content)
>      21             print('Writing: ', name)
> ---> 23 imagedown('https://www.stocksy.com/service/contributors/', 'contributors')
> 
> Input In [4], in imagedown(url, folder)
>      14 images = soup.find_all('img')
>      15 for image in images:
> ---> 16     name = image['href']
>      17     link = image['src']
>      18     with open(name.replace(' ', '-').replace('/', '') + '.jpg', 'wb') as f:
> 
> File ~\anaconda3\lib\site-packages\bs4\element.py:1519, in
> Tag.__getitem__(self, key)    1516 def __getitem__(self, key):    1517
> """tag[key] returns the value of the 'key' attribute for the Tag,   
> 1518     and throws an exception if it's not there."""
> -> 1519     return self.attrs[key]
> 
> KeyError: 'href'

Upvotes: 0

Views: 111

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195418

First, make sure you didn't get captcha page (try to set User-Agent HTTP header). Then, use correct CSS selector:

import requests
from bs4 import BeautifulSoup

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:105.0) Gecko/20100101 Firefox/105.0"
}


def imagedown(url, folder):
    r = requests.get(url, headers=headers)
    soup = BeautifulSoup(r.text, "html.parser")
    images = soup.select("a[data-profile] img")
    for i in images:
        name = i.find_next("a").text.strip()
        name2 = i.find_previous("a")["data-profile"]
        link = i["src"]

        print("{:<30} {:<30} {}".format(name, name2, link))
        # download image here
        # ...


imagedown("https://www.stocksy.com/service/contributors/", "contributors")

Prints:


...

ZHPH Production                zhushman                       https://c.stocksy.com/i/VR6000b1?m=20180926083521
ZOA PHOTO                      ZOA                            https://c.stocksy.com/i/QL5000b1?m=20171019190548
Zoran Milich                   zoranmilich                    https://c.stocksy.com/i/48Q000b1?m=20181101164004
ZQZ Studio                     zqzstudio                      https://c.stocksy.com/i/JnB100b1?m=20220330224825
Zutik by Andoni                zutik                          https://c.stocksy.com/i/ko0100b1?m=20210115112034

Upvotes: 2

Related Questions