BeautifulSoup doesn't take the full HTML code

Question

I'm having some troubles with this code, where I try to take all Pokemon's names from pokedex.org. My original code is the following :

import requests
from bs4 import BeautifulSoup

url = 'https://pokedex.org/'
html = BeautifulSoup(requests.get(url).content,'lxml')

uls = html.find('ul', attrs = {'id':'monsters-list'})

print(uls.prettify())

Then, uls should contain some

which themselves contain where the name is wrapped in. It works quite well taking all the content for the exact 100 first Pokemons, but then it returns me empty for the 500 others. I've tried different parsers such as html.parser, html5lib and lxml but it doesn't change anything.

Samsul Islam · Accepted Answer

The page is loaded dynamically, therefore requests won't support it. We can use Selenium as an alternative to scrape the page and need scroll page down also.

Install it with: pip install selenium.

Download the correct ChromeDriver from here. Here is code :

from bs4 import BeautifulSoup
from selenium import webdriver
import time

url = 'https://pokedex.org/'
webdriver = webdriver.Chrome()
webdriver.get(url)
time.sleep(2)

webdriver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)
html = BeautifulSoup(webdriver.page_source,'lxml')

uls = html.find('ul', attrs = {'id':'monsters-list'})

print(uls.prettify())

Output last item :


  
  
   Genesect

BeautifulSoup doesn't take the full HTML code

Answers (2)

Related Questions

BeautifulSoup doesn&#39;t take the full HTML code

Answers (2)

Related Questions

BeautifulSoup doesn't take the full HTML code