zeshaykes
zeshaykes

Reputation: 41

Why is selenium only picking up the first 12 items?

I'm trying to create a web scraper for a website (https://pokemondb.net/pokedex/national) that copies a list of images and saves them in a directory. Everything seems to work, except that instead of picking up the 800+ items that I was hoping it would, it only picks up 12. I've tried using selenium's implicit_wait, but it doesn't seem to work. I would like it to scrape every picture on the page.

Below is my code:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
import shutil
import os
import requests

def spritescrape(driver):
    sprites_list = driver.find_elements_by_tag_name('img')
    sprite_srcs = [sprite.get_attribute('src') for sprite in sprites_list]
    return sprite_srcs

def download_images(srcs, dirname):
    for index, src in enumerate(srcs):
        response = requests.get(src, stream=True)
        save_image(response, dirname, index)
    del response

def save_image(image, dirname, suffix):
    with open('{dirname}/img_{suffix}.jpg'.format(dirname=dirname, suffix=suffix), 'wb') as out_file:
        shutil.copyfileobj(image.raw, out_file)

def make_dir(dirname):
    current_path = os.getcwd()
    path = os.path.join(current_path, dirname)
    if not os.path.exists(path):
        os.makedirs(path)

if __name__ == '__main__':
    chromeexe_path = r'C:\code\Learning Python\Scrapers\chromedriver.exe'
    driver = webdriver.Chrome(executable_path=chromeexe_path)
    driver.get(r'https://pokemondb.net/pokedex/national')
    driver.implicitly_wait(10)

    sprite_links = spritescrape(driver)
    dirname = 'sprites'
    make_dir(dirname)
    download_images(sprite_links, dirname)

I've heard that some websites can be built in ways that prevent scraping, and I wonder if this is the case for this website. I'm very new to coding, so any help with getting all of the images would be greatly appreciated!

Upvotes: 0

Views: 386

Answers (3)

undetected Selenium
undetected Selenium

Reputation: 193208

The elements within the website uses Lazy Loading. So to extract the list of src attributes of the images you have to scroll down till the end of the page and you can use the following Locator Strategies:

  • Code Block:

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    driver.get("https://pokemondb.net/pokedex/national")
    myLength = len(WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//img[@class]"))))
    while True:
        try:
            driver.execute_script("window.scrollBy(0,1500)", "");
            WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//img[@class]")))
            WebDriverWait(driver, 20).until(lambda driver: len(driver.find_elements_by_xpath("//img[@class]")) > myLength)
            elements = driver.find_elements_by_xpath("//img[@class]")
            myLength = len(elements)
        except TimeoutException:
            break
    print(myLength)
    for element in elements:
        print(element.get_attribute("src"))
    driver.quit()
    
  • Console Output:

    890
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/bulbasaur.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/ivysaur.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/venusaur.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/charmander.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/charmeleon.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/charizard.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/squirtle.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/wartortle.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/blastoise.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/caterpie.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/metapod.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/butterfree.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/weedle.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/kakuna.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/beedrill.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/pidgey.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/pidgeotto.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/pidgeot.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/rattata.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/raticate.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/spearow.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/fearow.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/ekans.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/arbok.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/pikachu.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/raichu.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/sandshrew.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/sandslash.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/nidoran-f.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/nidorina.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/nidoqueen.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/nidoran-m.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/nidorino.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/nidoking.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/clefairy.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/clefable.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/vulpix.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/ninetales.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/jigglypuff.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/wigglytuff.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/zubat.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/golbat.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/oddish.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/gloom.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/vileplume.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/paras.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/parasect.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/venonat.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/venomoth.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/diglett.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/dugtrio.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/meowth.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/persian.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/psyduck.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/golduck.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/mankey.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/primeape.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/growlithe.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/arcanine.png
    .
    .
    .
    https://img.pokemondb.net/sprites/sword-shield/pixel/dreepy.png
    https://img.pokemondb.net/sprites/sword-shield/pixel/drakloak.png
    https://img.pokemondb.net/sprites/sword-shield/pixel/dragapult.png
    https://img.pokemondb.net/sprites/sword-shield/pixel/zacian-crowned.png
    https://img.pokemondb.net/sprites/sword-shield/pixel/zamazenta-crowned.png
    https://img.pokemondb.net/sprites/sword-shield/pixel/eternatus.png
    

Upvotes: 0

KunduK
KunduK

Reputation: 33384

You need to scroll the pages to the bottom.However if you go directly to the scrollHeight you will loose all the elements again.you need to use infinite loop and scroll slowly per page and add the elements attribute during scrolling so that it never lost further.I have got 890 elements.

Try the below code.

from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://pokemondb.net/pokedex/national")

sprite_srcs=[]
height=1000
itemsnobefore=len(sprite_srcs)
while True:
    driver.execute_script("window.scrollTo(0," + str(height) + ");")
    sprites_list = driver.find_elements_by_tag_name('img')

    for sprite in sprites_list:
        if sprite.get_attribute('src') not in sprite_srcs:
            sprite_srcs.append(sprite.get_attribute('src'))

    itemsnoafter=len(sprite_srcs)
    #Break the loop when there is no more image tag left
    if itemsnobefore==itemsnoafter:
        break
    itemsnobefore=itemsnoafter
    height=height+500
    time.sleep(0.25)

print(len(sprites_list))

Upvotes: 2

RKelley
RKelley

Reputation: 1119

All the elements are not loading when the page first opens. It appears they only load as your scroll down the page. What I've done in situations like this is to do a scroll to the bottom of the page first and then find elements. This has worked for my needs.

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

Upvotes: 0

Related Questions