Can't reach the bottom of a webpage

Question

I've written a script in python with selenium to handle the infinite scrolling webpage. The problem I'm facing is that It scrolls few times then quits the browser. It never reaches the bottom. I tried with Explicit Wait as well but that gives even fewer scrolling. How can I reach the bottom when there will be no more scrolling to do.

This is my try:

import time
from selenium import webdriver
from urllib.parse import urljoin

url = "https://www.instagram.com/explore/tags/travelphotoawards/"

driver = webdriver.Chrome()
driver.get(url)

last_len = len(driver.find_elements_by_css_selector(".v1Nh3 a"))
new_len = last_len

while True:
    last_len = new_len
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    time.sleep(5)

    items = driver.find_elements_by_css_selector(".v1Nh3 a")
    new_len = len(items)
    if last_len == new_len:break

driver.quit()

Edit:

If I try like below, I can do the scrolling as many times as I want but that is not a good idea to cope with:

import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

url = "https://www.instagram.com/explore/tags/travelphotoawards/"

driver = webdriver.Chrome()
driver.get(url)

for scroll in range(1,10):  #I can do the scrolling as many times as I want but it is fully hardcoded
    item = driver.find_element_by_tag_name("body")
    item.send_keys(Keys.END)
    elems = driver.find_elements_by_css_selector(".v1Nh3 a")
    time.sleep(3)

driver.quit()

I hope there is any way to do the scrolling automatically until it reaches the bottom.

Tarun Lalwani · Accepted Answer

So few thing here. In case of infinite scrolling I would follow few things

Disable images so that the scrolling is faster
Never trust a condition to be true if it is not consistent. Test it for it continuously for a period and if the condition is consistent then trust it
Try to not scroll way too long, infinite scrolling can cause browser to clog too much memory and sometimes even crash
Dump data in batches after every scroll. So on first page load, I would dump all page date. Then every scroll, I would just dump the delta part. This can be easily done using an xpath.

Below is a updated script which will do better for you. Do remember nothing is perfect, so you need to make your script adapt to failures

import time
from selenium import webdriver
from urllib.parse import urljoin

option = webdriver.ChromeOptions()
chrome_prefs = {}
option.experimental_options["prefs"] = chrome_prefs
chrome_prefs["profile.default_content_settings"] = {"images": 2}
chrome_prefs["profile.managed_default_content_settings"] = {"images": 2}


driver = webdriver.Chrome(chrome_options=option)

url = "https://www.instagram.com/explore/tags/travelphotoawards/"

driver.get(url)

last_len = len(driver.find_elements_by_css_selector(".v1Nh3 a"))
new_len = last_len

consistent = 0
while True:
    last_len = new_len
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    time.sleep(5)
    items = driver.find_elements_by_css_selector(".v1Nh3 a")
    new_len = len(items)
    if last_len == new_len:
        consistent += 1
        if consistent == 3:
            break
    else:
        consistent = 0

driver.quit()

Can't reach the bottom of a webpage

Edit:

Answers (2)

Related Questions

Can&#39;t reach the bottom of a webpage

Edit:

Answers (2)

Related Questions

Can't reach the bottom of a webpage