Reputation: 49
I've tried to build a function to download multiple images from this web page https://www.stocksy.com/service/contributors/ with name replacement of each name but I've got errors I tried to change 'href' to 'data-profile' here is the full code :
import requests
from bs4 import BeautifulSoup
import os
#url = 'https://www.stocksy.com/service/contributors/'
def imagedown(url, folder):
try:
os.mkdir(os.path.join(os.getcwd(), folder))
except:
pass
os.chdir(os.path.join(os.getcwd(), folder))
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
images = soup.find_all('img')
for image in images:
name = image['href']
link = image['src']
with open(name.replace(' ', '-').replace('/', '') + '.jpg', 'wb') as f:
im = requests.get(link)
f.write(im.content)
print('Writing: ', name)
imagedown('https://www.stocksy.com/service/contributors/', 'contributors')
and here is the bugs :
> KeyError Traceback (most recent call
> last) Input In [4], in <cell line: 23>()
> 20 f.write(im.content)
> 21 print('Writing: ', name)
> ---> 23 imagedown('https://www.stocksy.com/service/contributors/', 'contributors')
>
> Input In [4], in imagedown(url, folder)
> 14 images = soup.find_all('img')
> 15 for image in images:
> ---> 16 name = image['href']
> 17 link = image['src']
> 18 with open(name.replace(' ', '-').replace('/', '') + '.jpg', 'wb') as f:
>
> File ~\anaconda3\lib\site-packages\bs4\element.py:1519, in
> Tag.__getitem__(self, key) 1516 def __getitem__(self, key): 1517
> """tag[key] returns the value of the 'key' attribute for the Tag,
> 1518 and throws an exception if it's not there."""
> -> 1519 return self.attrs[key]
>
> KeyError: 'href'
Upvotes: 0
Views: 111
Reputation: 195418
First, make sure you didn't get captcha page (try to set User-Agent
HTTP header). Then, use correct CSS selector:
import requests
from bs4 import BeautifulSoup
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:105.0) Gecko/20100101 Firefox/105.0"
}
def imagedown(url, folder):
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.text, "html.parser")
images = soup.select("a[data-profile] img")
for i in images:
name = i.find_next("a").text.strip()
name2 = i.find_previous("a")["data-profile"]
link = i["src"]
print("{:<30} {:<30} {}".format(name, name2, link))
# download image here
# ...
imagedown("https://www.stocksy.com/service/contributors/", "contributors")
Prints:
...
ZHPH Production zhushman https://c.stocksy.com/i/VR6000b1?m=20180926083521
ZOA PHOTO ZOA https://c.stocksy.com/i/QL5000b1?m=20171019190548
Zoran Milich zoranmilich https://c.stocksy.com/i/48Q000b1?m=20181101164004
ZQZ Studio zqzstudio https://c.stocksy.com/i/JnB100b1?m=20220330224825
Zutik by Andoni zutik https://c.stocksy.com/i/ko0100b1?m=20210115112034
Upvotes: 2