pythonlearn
pythonlearn

Reputation: 439

Response fails to update with selenium scroll

The script is supposed to get all the links from the base_url which displays a subset of results and when scrolled down more results are added to the subset until the list is exhausted. I am able to do that but the issue is that I am only able to retrieve only those few links that load up initially when the web page shows up without performing any scroll. The response should be able to update alongside scroll by web driver. However, this is my code so far.

import re
import requests
import time

from bs4 import BeautifulSoup
from selenium import webdriver

mybrowser = webdriver.Chrome("E:\chromedriver.exe")

base_url = "https://genius.com/search?q="+"drake"

myheader = {'User-Agent':''}

mybrowser.get(base_url)
t_end = time.time() + 60 * 1
while(time.time()<t_end):
    mybrowser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    response = requests.get(base_url, headers = myheader)
    soup = BeautifulSoup(response.content, "lxml")

pattern = re.compile("[\S]+-lyrics$")

for link in soup.find_all('a',href=True):
    if pattern.match(link['href']):
        print (link['href'])

Only displays first few links. The links that load up when selenium scrolls the page are not retrieved.

Upvotes: 1

Views: 306

Answers (1)

xrisk
xrisk

Reputation: 3898

You need to parse the HTML from Selenium itself (this changes when Selenium scrolls the webpage), and not use requests to download the page.

Change:

response = requests.get(base_url, headers = myheader)
soup = BeautifulSoup(response.content, "lxml")

to:

html = mybrowser.page_source
soup = BeautifulSoup(html, "lxml")

And it should work just fine.

Upvotes: 2

Related Questions