Reputation: 41
I'm trying to create a web scraper for a website (https://pokemondb.net/pokedex/national) that copies a list of images and saves them in a directory. Everything seems to work, except that instead of picking up the 800+ items that I was hoping it would, it only picks up 12. I've tried using selenium's implicit_wait
, but it doesn't seem to work. I would like it to scrape every picture on the page.
Below is my code:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
import shutil
import os
import requests
def spritescrape(driver):
sprites_list = driver.find_elements_by_tag_name('img')
sprite_srcs = [sprite.get_attribute('src') for sprite in sprites_list]
return sprite_srcs
def download_images(srcs, dirname):
for index, src in enumerate(srcs):
response = requests.get(src, stream=True)
save_image(response, dirname, index)
del response
def save_image(image, dirname, suffix):
with open('{dirname}/img_{suffix}.jpg'.format(dirname=dirname, suffix=suffix), 'wb') as out_file:
shutil.copyfileobj(image.raw, out_file)
def make_dir(dirname):
current_path = os.getcwd()
path = os.path.join(current_path, dirname)
if not os.path.exists(path):
os.makedirs(path)
if __name__ == '__main__':
chromeexe_path = r'C:\code\Learning Python\Scrapers\chromedriver.exe'
driver = webdriver.Chrome(executable_path=chromeexe_path)
driver.get(r'https://pokemondb.net/pokedex/national')
driver.implicitly_wait(10)
sprite_links = spritescrape(driver)
dirname = 'sprites'
make_dir(dirname)
download_images(sprite_links, dirname)
I've heard that some websites can be built in ways that prevent scraping, and I wonder if this is the case for this website. I'm very new to coding, so any help with getting all of the images would be greatly appreciated!
Upvotes: 0
Views: 386
Reputation: 193208
The elements within the website uses Lazy Loading. So to extract the list of src
attributes of the images you have to scroll down till the end of the page and you can use the following Locator Strategies:
Code Block:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get("https://pokemondb.net/pokedex/national")
myLength = len(WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//img[@class]"))))
while True:
try:
driver.execute_script("window.scrollBy(0,1500)", "");
WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//img[@class]")))
WebDriverWait(driver, 20).until(lambda driver: len(driver.find_elements_by_xpath("//img[@class]")) > myLength)
elements = driver.find_elements_by_xpath("//img[@class]")
myLength = len(elements)
except TimeoutException:
break
print(myLength)
for element in elements:
print(element.get_attribute("src"))
driver.quit()
Console Output:
890
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/bulbasaur.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/ivysaur.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/venusaur.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/charmander.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/charmeleon.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/charizard.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/squirtle.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/wartortle.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/blastoise.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/caterpie.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/metapod.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/butterfree.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/weedle.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/kakuna.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/beedrill.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/pidgey.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/pidgeotto.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/pidgeot.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/rattata.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/raticate.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/spearow.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/fearow.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/ekans.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/arbok.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/pikachu.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/raichu.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/sandshrew.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/sandslash.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/nidoran-f.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/nidorina.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/nidoqueen.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/nidoran-m.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/nidorino.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/nidoking.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/clefairy.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/clefable.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/vulpix.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/ninetales.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/jigglypuff.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/wigglytuff.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/zubat.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/golbat.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/oddish.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/gloom.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/vileplume.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/paras.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/parasect.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/venonat.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/venomoth.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/diglett.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/dugtrio.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/meowth.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/persian.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/psyduck.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/golduck.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/mankey.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/primeape.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/growlithe.png
https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/arcanine.png
.
.
.
https://img.pokemondb.net/sprites/sword-shield/pixel/dreepy.png
https://img.pokemondb.net/sprites/sword-shield/pixel/drakloak.png
https://img.pokemondb.net/sprites/sword-shield/pixel/dragapult.png
https://img.pokemondb.net/sprites/sword-shield/pixel/zacian-crowned.png
https://img.pokemondb.net/sprites/sword-shield/pixel/zamazenta-crowned.png
https://img.pokemondb.net/sprites/sword-shield/pixel/eternatus.png
Upvotes: 0
Reputation: 33384
You need to scroll the pages to the bottom.However if you go directly to the scrollHeight
you will loose all the elements again.you need to use infinite loop and scroll slowly per page and add the elements attribute during scrolling so that it never lost further.I have got 890 elements.
Try the below code.
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://pokemondb.net/pokedex/national")
sprite_srcs=[]
height=1000
itemsnobefore=len(sprite_srcs)
while True:
driver.execute_script("window.scrollTo(0," + str(height) + ");")
sprites_list = driver.find_elements_by_tag_name('img')
for sprite in sprites_list:
if sprite.get_attribute('src') not in sprite_srcs:
sprite_srcs.append(sprite.get_attribute('src'))
itemsnoafter=len(sprite_srcs)
#Break the loop when there is no more image tag left
if itemsnobefore==itemsnoafter:
break
itemsnobefore=itemsnoafter
height=height+500
time.sleep(0.25)
print(len(sprites_list))
Upvotes: 2
Reputation: 1119
All the elements are not loading when the page first opens. It appears they only load as your scroll down the page. What I've done in situations like this is to do a scroll to the bottom of the page first and then find elements. This has worked for my needs.
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
Upvotes: 0