LdM
LdM

Reputation: 704

Using xpath for looking at specific elements to scrape

I'm trying to get some information from a website, scamadviser.com. In particular I'd interested in the final score in the shield (for example, for stackoverflow.com check the value in the shield is 100%). I've tried to inspect it, and I see that the path is: enter image description here

I did

def scam(df):
    chrome_options = webdriver.ChromeOptions()

    trust=[]
    country = [] 
    isp_country = [] 
        
    urls=['stackoverflow.com','GitHub.com']
    driver=webdriver.Chrome('mypath',chrome_options=chrome_options))
    
    for x in urls:
        
        wait = WebDriverWait(driver, 20)
        response=driver.get('https://www.scamadviser.com/check-website/'+x)
        
        try: 
            wait = WebDriverWait(driver, 30)
            
            t=driver.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", driver.find_element_by_xpath("//div[contains(@class,'trust__overlay shield-color--green') and contains(text(),'icon')]")).get_attribute('innerText')
            trust.append(t)  

            c=driver.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", driver.find_element_by_xpath("//div[contains(@class,'block__col') and contains(text(),'Country')]")).get_attribute('innerText')
            country.append(c)  

           ic=driver.find_element_by_xpath("//div[contains(@class,'block__col') and contains(text(),'ISP')]").get_attribute('innerText')
            isp_country.append(ic)
        
        except: 
            trust.append("Error")
            country.append("Error")
            isp_country.append("Error")
            

    # Create dataframe
    dict = {'URL': urls, 'Trust':trust, 'Country': country, 'ISP': isp_country} 
    df=pd.DataFrame(dict)

    driver.quit()
    
    return df

but the dataframe created contains only Errors (i.e., it executes only the except in the try/except).

I can't understand if the error is due to the try/except and/or to the way I look at the element (using xpath). Any help would be great. Thanks

Upvotes: 1

Views: 61

Answers (1)

cruisepandey
cruisepandey

Reputation: 29382

Based on the OP response and for this particular ticket, to get the trusted score from the website mentioned by OP, the below xpath has 1/1 matching node in HTML DOM.

Xpath :-

//div[text()='Trustscore']/../following-sibling::div/descendant::div[@class='icon']

You do not need to scroll for this web element, cause as soon as windows is launched, trusted score is in Selenium view port.

Use it with explicit waits like this :

trusted_score = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[text()='Trustscore']/../following-sibling::div/descendant::div[@class='icon']")))
print(trusted_score.text)

for this you'll need imports as well.

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

PS : Make sure Selenium windows is launched in full screen mode.

driver.maximize_window()

Update 1 :

data = {'URL': urls, 
        'Trust': trust, 
        'Country': country, 
        'ISP': isp_country}
df = pd.DataFrame.from_dict(data)

Upvotes: 1

Related Questions