nzalle
nzalle

Reputation: 29

How to continue a script even if there is a missing element on the current page?

I am working on a scraping project and am trying so scrape many different profiles. Not all of the profiles have the same information, so I want to skip that piece of data if the current profile does not have it. Here is my current code:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from time import sleep


driver = webdriver.Chrome("MY DIRECTORY")
driver.get("https://directory.bcsp.org/")
count = int(input("Number of Pages to Scrape: "))

body = driver.find_element_by_xpath("//body") #
profile_count = driver.find_elements_by_xpath("//div[@align='right']/a")

while len(profile_count) < count: # Get links up to "count"
    body.send_keys(Keys.END)
    sleep(1)
    profile_count = driver.find_elements_by_xpath("//div[@align='right']/a")

for link in profile_count: # Calling up links
    temp = link.get_attribute('href') # temp for
    driver.execute_script("window.open('');") # open new tab
    driver.switch_to.window(driver.window_handles[1]) # focus new tab
    driver.get(temp)


    ##### SCRAPE CODE #####
    Name = driver.find_element_by_xpath('/html/body/table/tbody/tr/td/table/tbody/tr/td[5]/div/table[1]/tbody/tr/td[1]/div[2]/div')

    IssuedBy = driver.find_element_by_xpath('/html/body/table/tbody/tr/td/table/tbody/tr/td[5]/div/table[1]/tbody/tr/td[3]/table/tbody/tr[1]/td[1]/div[2]')

    CertificationNumber = driver.find_element_by_xpath('/html/body/table/tbody/tr/td/table/tbody/tr/td[5]/div/table[1]/tbody/tr/td[3]/table/tbody/tr[1]/td[3]/div[2]')

    CertfiedSince = driver.find_element_by_xpath('/html/body/table/tbody/tr/td/table/tbody/tr/td[5]/div/table[1]/tbody/tr/td[3]/table/tbody/tr[3]/td[1]/div[2]')

    RecertificationCycle = driver.find_element_by_xpath('/html/body/table/tbody/tr/td/table/tbody/tr/td[5]/div/table[1]/tbody/tr/td[3]/table/tbody/tr[3]/td[3]/div[2]')

    Expires = driver.find_element_by_xpath('/html/body/table/tbody/tr/td/table/tbody/tr/td[5]/div/table[1]/tbody/tr/td[3]/table/tbody/tr[5]/td[1]/div[2]')

    AccreditedBy = driver.find_element_by_xpath('/html/body/table/tbody/tr/td/table/tbody/tr/td[5]/div/table[1]/tbody/tr/td[3]/table/tbody/tr[5]/td[3]/div[2]/a')

    print(Name.text + " : " + IssuedBy.text + " : " + CertificationNumber.text + " : " + CertfiedSince.text + " : " + RecertificationCycle.text + " : " + Expires.text + " : " + AccreditedBy.text)


    driver.close()
    driver.switch_to.window(driver.window_handles[0])
driver.close()

Please let me know how I would be able to skip an element if it is not present on the current profile.

Upvotes: 1

Views: 53

Answers (2)

Osadhi Virochana
Osadhi Virochana

Reputation: 1302

It will fix try:.. except:.. but you have some other errors too. I fixed them all. Code:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from time import sleep


driver = webdriver.Chrome('chromedriver')
driver.get("https://directory.bcsp.org/")
count = int(input("Number of Pages to Scrape: "))

body = driver.find_element_by_xpath("//body") #
profile_count = driver.find_elements_by_xpath("//div[@align='right']/a")
c = 1
while c <= count:
    for link in profile_count: # Calling up links
        temp = link.get_attribute('href') # temp for
        driver.execute_script("window.open('');") # open new tab
        driver.switch_to.window(driver.window_handles[1]) # focus new tab
        driver.get(temp)
        sleep(1)
        ##### SCRAPE CODE #####
        try:
            Name = driver.find_element_by_xpath('/html/body/table/tbody/tr/td/table/tbody/tr/td[5]/div/table[1]/tbody/tr/td[1]/div[2]/div')

            IssuedBy = driver.find_element_by_xpath('/html/body/table/tbody/tr/td/table/tbody/tr/td[5]/div/table[1]/tbody/tr/td[3]/table/tbody/tr[1]/td[1]/div[2]')

            CertificationNumber = driver.find_element_by_xpath('/html/body/table/tbody/tr/td/table/tbody/tr/td[5]/div/table[1]/tbody/tr/td[3]/table/tbody/tr[1]/td[3]/div[2]')

            CertfiedSince = driver.find_element_by_xpath('/html/body/table/tbody/tr/td/table/tbody/tr/td[5]/div/table[1]/tbody/tr/td[3]/table/tbody/tr[3]/td[1]/div[2]')

            RecertificationCycle = driver.find_element_by_xpath('/html/body/table/tbody/tr/td/table/tbody/tr/td[5]/div/table[1]/tbody/tr/td[3]/table/tbody/tr[3]/td[3]/div[2]')
        except:
            c -= 1
        driver.switch_to.window(driver.window_handles[0])
        c += 1
        if c > count:
            break
driver.quit()

Upvotes: 1

Lizzy
Lizzy

Reputation: 119

According to the docs, find_element_by_xpath() raises a NoSuchElementException if the element you're looking for couldn't be found.

I suggest handling potential NoSuchElementExceptions accordingly. What a proper exception handling could look like depends on what you're trying to achieve, you might want to log an error, assign default values, skip certain follow up actions...

try:
    Name = driver.find_element_by_xpath('/html/body/table/tbody/tr/td/table/tbody/tr/td[5]/div/table[1]/tbody/tr/td[1]/div[2]/div')
except NoSuchElementException:
    Name = "Default Name"

You could even wrap multiple find_element_by_xpath() calls in your try block.

Upvotes: 2

Related Questions