Reputation: 269

Extracting text using selenium

Below is an excerpt of the html code:

<div class="Class1">Category1</div>
<div class="Class2">"Text1 I want"</div>
<div class="Class1">Category2</div>
<div class="Class2">"Text2 I want"</div>

I know I can extract Text1 and Text2 by using:

find_element = browser.find_elements_by_xpath("//div[@class='Class2']")
element = [x.text for x in find_element]
text1 = element[0]
text2 = element[1]

But if the structure of the html is changed, elements will be changed accordingly. Is there any way for me to extract Text1 and Text2 by referring to Category1 and Category2, respectively?

Thank you.

Upvotes: 0

Answers (3)

Arnon Axelrod

Reputation: 1672

I guess that your concern regarding changes to the structure of the html are based on the fact that the semantics of the data is of key- value paid (the keys being the categories and the values are the text), while the structure is simply a list of divs where the odd ones are the keys and the following even ones are their corresponding values. The problem though isn't with your Selenium locators, but rather in the structure of the html itself (which consequently affects your ability to use more robust locators). I would suggest that you ask the developers to improve the structure of the html to reflect it's appropriate semantics. Discuss together the best structure that fits all the needs, including those of the test automation.

Upvotes: 0

yong

Reputation: 13722

If the Text I want always inside the next sibling div of Category div, you can try as following:

Case 1

<div class="Class1">Category1</div>
<div class="Class2">"Text1 I want"</div>

//div[.='Category1']/following-sibling::div[1]

Case 2

<div class="Class1">Category1</div>
<div class="Class2">
  <div class="xxx">
    <span>"Text1 I want"</span>
  </div>
</div>

//div[.='Category1']/following-sibling::div[1]//span

There can be many possible structure, the key part in the xpath is //div[.='Category1']/following-sibling::div[1]

Upvotes: 1

DYZ

Reputation: 57125

I suggest using BeautifulSoup. Find the Category1 tag, then its next_sibling:

import bs4
your_html = browser.page_source
soup = bs4.BeautifulSoup(your_html, 'lxml')

class1tag = soup.find('div', text='Category1')
tag = class1tag.next_sibling.next_sibling
print(tag)
#<div class="Class2">"Text1 I want"</div>
print(tag.text)
#"Text1 I want"

Upvotes: 0

Extracting text using selenium

Answers (3)

Related Questions