RandallCloud
RandallCloud

Reputation: 133

How to scroll correctly in a dynamically-loading webpage with Selenium?

Here's the link of the website : website

I would like to have all the links of th hotels in this location.

Here's my script :

import pandas as pd
import numpy as np
from selenium import webdriver
import time

PATH = "driver\chromedriver.exe"

options = webdriver.ChromeOptions() 
options.add_argument("--disable-gpu")
options.add_argument("--window-size=1200,900")
options.add_argument('enable-logging')


driver = webdriver.Chrome(options=options, executable_path=PATH)

driver.get('https://fr.hotels.com/search.do?destination-id=10398359&q-check-in=2021-06-24&q-check-out=2021-06-25&q-rooms=1&q-room-0-adults=2&q-room-0-children=0&sort-order=BEST_SELLER')

cookie = driver.find_element_by_xpath('//button[@class="uolsaJ"]')
try:
    cookie.click()
except:
    pass

for i in range(30):
    driver.execute_script("window.scrollBy(0, 1000)")
    time.sleep(5)

time.sleep(5)

my_elems = driver.find_elements_by_xpath('//a[@class="_61P-R0"]')

links = [my_elem.get_attribute("href") for my_elem in my_elems]


X = np.array(links)
print(X.shape)
#driver.close()

But I cannot find a way to tell the script : scroll down until there is nothing more to scroll.

I tried to change this parameters :

for i in range(30):
    driver.execute_script("window.scrollBy(0, 1000)")
    time.sleep(30)

I changed the time.sleep(), the number 1000 and so on but my output keep changing and not in the right way.

output

As you can see, I have scraped a lot of numbers differents. How to make my script scraping a same amout each time ? Not necessarily each links but at last a stable number.

Here it scroll and at one point it seems blocked and scrape all the links it has at the moment. That's not appropriate.

Upvotes: 2

Views: 2853

Answers (2)

Dmitriy Zub
Dmitriy Zub

Reputation: 1724

You can try this by directly calling the DOM and locate some element that will be only at the bottom of the page with .is_displayed() selenium method which returns true/false:

# https://stackoverflow.com/a/57076690/15164646
while True:
  # it will be returning false until the element is located
  # "#message" id = "No more results" at the bottom of the YouTube search
  end_result = driver.find_element_by_css_selector('#message').is_displayed() 
  driver.execute_script("var scrollingElement = (document.scrollingElement || document.body);scrollingElement.scrollTop = scrollingElement.scrollHeight;")

  # further code below
  
  # once the element is found it returns True. If so, it will break out of the while loop
  if end_result == True:
    break

I wrote a blog post where I used this method to scrape YouTube Search.

Upvotes: 1

Prophet
Prophet

Reputation: 33361

There are several issues here.

  1. You are getting the elements and their links only AFTER you finished scrolling while you should do that inside the scrolling loop.
  2. You should wait until the cookies alert is appearing to close it.
  3. You can scroll until the footer element is presented.
    Something like this:
import pandas as pd
import numpy as np
from selenium import webdriver
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

PATH = "driver\chromedriver.exe"

options = webdriver.ChromeOptions() 
options.add_argument("--disable-gpu")
options.add_argument("--window-size=1200,900")
options.add_argument('enable-logging')


driver = webdriver.Chrome(options=options, executable_path=PATH)
wait = WebDriverWait(driver, 20)

driver.get('https://fr.hotels.com/search.do?destination-id=10398359&q-check-in=2021-06-24&q-check-out=2021-06-25&q-rooms=1&q-room-0-adults=2&q-room-0-children=0&sort-order=BEST_SELLER')

wait.until(EC.visibility_of_element_located((By.XPATH, '//button[@class="uolsaJ"]'))).click()

def is_element_visible(xpath):
    wait1 = WebDriverWait(driver, 2)
    try:
        wait1.until(EC.visibility_of_element_located((By.XPATH, xpath)))
        return True
    except Exception:
        return False

while not is_element_visible("//footer[@id='footer']"):
    my_elems = driver.find_elements_by_xpath('//a[@class="_61P-R0"]')

    links = [my_elem.get_attribute("href") for my_elem in my_elems]

    X = np.array(links)
    print(X.shape)

    driver.execute_script("window.scrollBy(0, 1000)")
    time.sleep(5)


#driver.close()

Upvotes: 2

Related Questions