jeremoquai
jeremoquai

Reputation: 101

Why render / requests-html doesn't scrape dynamic content?

Long story short : switched from Selenium to Requests(-html).

Works OK but not in every case.

Page : https://www.winamax.fr/paris-sportifs/sports/1/1/1

Upon load it charges dynamic content with english games (example : Sheffield United - West Ham).

But when I try to do this :

from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://www.winamax.fr/paris-sportifs/1/1/1')
r.html.render()
print(r.html.text) # I also tried print(r.html.html)

the games don't show in the output.

Why ? Thanks !

Upvotes: 4

Views: 9294

Answers (2)

Eva423
Eva423

Reputation: 53

I found that using the sleep parameter in the render function to wait for a few seconds before rendering was the only thing that worked for me:

from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://www.winamax.fr/paris-sportifs/sports/1/1/1')
r.html.render(sleep=10)
print(r.html.html)
session.close()

From the requests-html documentation:

render(retries: int = 8, script: str = None, wait: float = 0.2, scrolldown=False, **sleep: int = 0**, reload: bool = True, timeout: Union[float, int] = 8.0, keep_page: bool = False, cookies: list = [{}], send_cookies_session: bool = False)[source]

Reloads the response in Chromium, and replaces HTML content with an updated version, with JavaScript executed.

Parameters:

  • retries – The number of times to retry loading the page in Chromium.
  • script – JavaScript to execute upon page load (optional).
  • wait – The number of seconds to wait before loading the page, preventing timeouts (optional).
  • scrolldown – Integer, if provided, of how many times to page down.
  • sleep – Integer, if provided, of how many seconds to sleep after initial render.
  • reload – If False, content will not be loaded from the browser, but will be provided from memory.
  • keep_page – If True will allow you to interact with the browser page through r.html.page.
  • send_cookies_session – If True send HTMLSession.cookies convert.
  • cookies – If not empty send cookies.

Upvotes: 1

fardV
fardV

Reputation: 308

add timeout, it should work, sorry this must be a comment but I cannot comment..

from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://www.winamax.fr/paris-sportifs/sports/1/1/1')
r.html.render(timeout=20)
print(r.html.html)
session.close()

Upvotes: 4

Related Questions