Reputation: 119
I am newbie to web scraping using selenium and I am scraping seetickets.us My scraper works as follows.
Now the problem is that some of the events do not contain some elements such as this event: https://wl.seetickets.us/event/Beta-Hi-Fi/484490?afflky=WorldCafeLive
which does not contain pricing table but this one does
https://www.seetickets.us/event/Wake-Up-Daisy-1100AM/477633
so I have used try except blocks
try:
find element
except:
return none
but if it doesnt found the element in try, it takes 5 seconds to go to except because I have used
webdriver.implicitwait(5)
Now , if any page does not contain multiple elements , the selenium takes very much time to scrape that page.
I have thousands of pages to scrape. What should be done to speed up the process.
Thanks
Upvotes: 3
Views: 2881
Reputation: 309
Instead of using implicitWait and waiting for each individual element, only wait for the full page load, for example wait for h1 tag, which will indicate the full page has been loaded then proceed with extraction.
#wait for page load
try:
pageLoadCheck=WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH, "(//h1)[1]"))).get_attribute("textContent").strip()
#extract data without any wait once the page is loaded
try:
dataOne=driver.find_element_by_xpath("((//h1/following-sibling::div)[1]//a[contains(@href,'tel:')])[1]").get_attribute("textContent").strip()
except:
dataOne=''
except Exception as e:
print(e)
Upvotes: 1
Reputation: 193338
To speedup web scraping using Selenium:
Your effective code block will be:
try:
element = WebDriverWait(driver, 3).until(EC.visibility_of_element_located((By.ID, "input"))))
print("Element is visible")
except TimeoutException:
print("Element is not visible")
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Upvotes: 1
Reputation: 321
Instead of ImplicitWait try to use ExplicitWait but apply it to search of main container only to wait for content to be loaded. For all inner elements apply find_element
with no waits.
P.S. It's always better to share your real code instead of pseudo-code
Upvotes: 1