Reputation: 22440
I've written a script in python in combination with selenium to get some titles out of some images from a webpage. The thing is the content I would like to parse are located near the bottom of that page. So, If i try like the conventional way to grab that, the browse fails.
So, I used a javascript code within my scraper to let the browser scroll to the bottom and it worked.
However, I don't think it's a good solution to keep up so tried with .scrollIntoView()
but that didn't work either. What can be the ideal way to serve the purpose?
This is my script:
from selenium import webdriver
import time
URL = "https://www.99acres.com/supertech-cape-town-sector-74-noida-npxid-r922?sid=UiB8IFFTIHwgUyB8IzMxIyAgfCAxIHwgNyM0MyMgfCA4MjEyIHwjNSMgIHwg"
driver = webdriver.Chrome()
driver.get(URL)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") #I don't wish to keep this line
time.sleep(3)
for item in driver.find_elements_by_css_selector("#carousel img"):
print(item.get_attribute("title"))
driver.quit()
Upvotes: 1
Views: 108
Reputation: 52665
Try to use below code that should allow you to scroll to required node and scrape images:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
banks = driver.find_element_by_id("xidBankSection")
driver.execute_script("arguments[0].scrollIntoView();", banks)
images = WebDriverWait(driver, 5).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#carousel img")))
for image in images:
print(image.get_attribute("title"))
Some explanation: initially those images are absent in source code and generated inside BankSection once you scrolled to it, so you need to scroll down to BankSection and wait until images generated
Upvotes: 1
Reputation: 1289
You can try below line of code
recentList = driver.find_elements_by_css_selector("#carousel img"):
for list in recentList :
driver.execute_script("arguments[0].scrollIntoView();", list )
print(list.get_attribute("title"))
Upvotes: 0