Reputation: 103
I need to get Backlight Image Data so I'm trying to get backlight images from pixabay. But only 16 images are downloaded by the following code.
I tried to find why, and I found the difference in the html source. The images that I downloaded are in the tag "img srcset", and my source downloads the first picture in the srcset. But the other pictures are in "img src", and my source can't download it. Does anyone know what is the problem??
from bs4 import BeautifulSoup
import urllib.request
import os.path
url="https://pixabay.com/images/search/backlight/"
opener = urllib.request.build_opener()
opener.addheaders = [('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1941.0 Safari/537.36')]
urllib.request.install_opener(opener)
req = urllib.request.Request(url)
response = urllib.request.urlopen(req)
source = response.read()
soup = BeautifulSoup(source, "html.parser")
img = soup.find_all("img")
cnt = 0
for image in img:
img_src=image.get("src")
if img_src[0]=='/':
continue
cnt += 1
print(img_src)
path = "C:/Users/Guest001/Test/" + str(cnt) + ".jpg"
print(path)
urllib.request.urlretrieve(img_src, path)
Upvotes: 0
Views: 571
Reputation: 3113
Some of the images have in src
a /static/img/blank.gif
and the real url is in the data-lazy
attribute. Also some of the images have .png
suffix. Here is a working example.
from bs4 import BeautifulSoup
import urllib.request
import os.path
url="https://pixabay.com/images/search/backlight/"
opener = urllib.request.build_opener()
opener.addheaders = [('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1941.0 Safari/537.36')]
urllib.request.install_opener(opener)
req = urllib.request.Request(url)
response = urllib.request.urlopen(req)
source = response.read()
soup = BeautifulSoup(source, "html.parser")
img = soup.find_all("img")
cnt = 0
for image in img:
img_src= image.get("src") if '.gif' not in image.get("src") else image.get('data-lazy')
if img_src[0]=='/':
continue
cnt += 1
print(img_src)
path = ''
if '.jpg' in img_src:
path = "C:/Users/Guest001/Test/" + str(cnt) + ".jpg"
elif '.png' in img_src:
path = "C:/Users/Guest001/Test/" + str(cnt) + ".png"
print(path)
urllib.request.urlretrieve(img_src, path)
Upvotes: 2