Sashaank
Sashaank

Reputation: 964

How to scrape elements with no class or id from a website

I am working on a selenium project. In the project, I am trying to scrape a particular element from the website. The element has no class or ID associated with it. So I am kind of stuck on how to extract that detail.

This is the website

In the website, if you look at the HTML markup for specifications, there is a div with contents <b>Form</b>: Liquid. I want to extract the 'Liquid'.

this is my code so far

def extract():
            
    form_element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, "//b[text()='Form']/")))
    form_text = form_element.text
    return form_text

This is resulting in a TimeOutException. I am not sure what I am doing wrong.

PS: I was able to click the show more button on the page to display the specifications area with selenium. Just in case you are wondering, that is not the problem.

Upvotes: 0

Views: 1587

Answers (4)

Justin Lambert
Justin Lambert

Reputation: 978

When we try to get elements by locators ID is unique ones, if you dont have Id You can go with class name ,xpath and linktext

Use this xapth:

//*[contains(text(),'Liquid')]

Upvotes: 1

Irfan wani
Irfan wani

Reputation: 5075

You can do that by setting to driver = webdriver.Chrome() {say if you are using chrome and you have webdriver for chrome installed} and writing the next line as; driver.find_element_by_tag_name("h1") [say if you wanted to extract details about h1 element and use that element.].Hope i understood your question correctly.

Upvotes: 0

KunduK
KunduK

Reputation: 33384

To get the value Liquid you need to click on Show more button first and then wait for visibility_of_element_located() for the element on the page.You can use following approach to get the value.

Using Split()

driver.get("https://www.target.com/p/hawaiian-punch-fruit-juicy-red-1-gal-bottle/-/A-13051948")
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//button[@data-test='toggleContentButton' and contains(.,'Show more')]"))).click()
print(WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.XPATH,"//div[./b[text()='Form:']]"))).text.split("Form:")[-1])

Using Java Scripts Executor

driver.get("https://www.target.com/p/hawaiian-punch-fruit-juicy-red-1-gal-bottle/-/A-13051948")
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//button[@data-test='toggleContentButton' and contains(.,'Show more')]"))).click()
print(driver.execute_script('return arguments[0].lastChild.textContent;', WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.XPATH,"//div[./b[text()='Form:']]")))))

Upvotes: 0

frianH
frianH

Reputation: 7563

Get the div parent from the elements you want using this xpath:

//b[text()='Form:']//parent::div

And to grab the text it seem like you have to using .get_attribute('innerHTML') instead of .text

Try following code:

def extract():
    form_element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, "//b[text()='Form:']//parent::div")))
    form_text = form_element.get_attribute('innerHTML').split("</b>",1)[1]
    return form_text

Upvotes: 1

Related Questions