Reputation: 269
Below is an excerpt of the html code:
<div class="Class1">Category1</div>
<div class="Class2">"Text1 I want"</div>
<div class="Class1">Category2</div>
<div class="Class2">"Text2 I want"</div>
I know I can extract Text1 and Text2 by using:
find_element = browser.find_elements_by_xpath("//div[@class='Class2']")
element = [x.text for x in find_element]
text1 = element[0]
text2 = element[1]
But if the structure of the html is changed, elements will be changed accordingly. Is there any way for me to extract Text1 and Text2 by referring to Category1 and Category2, respectively?
Thank you.
Upvotes: 0
Views: 52
Reputation: 1672
I guess that your concern regarding changes to the structure of the html are based on the fact that the semantics of the data is of key- value paid (the keys being the categories and the values are the text), while the structure is simply a list of divs where the odd ones are the keys and the following even ones are their corresponding values. The problem though isn't with your Selenium locators, but rather in the structure of the html itself (which consequently affects your ability to use more robust locators). I would suggest that you ask the developers to improve the structure of the html to reflect it's appropriate semantics. Discuss together the best structure that fits all the needs, including those of the test automation.
Upvotes: 0
Reputation: 13722
If the Text I want
always inside the next sibling div
of Category div
, you can try as following:
Case 1
<div class="Class1">Category1</div>
<div class="Class2">"Text1 I want"</div>
//div[.='Category1']/following-sibling::div[1]
Case 2
<div class="Class1">Category1</div>
<div class="Class2">
<div class="xxx">
<span>"Text1 I want"</span>
</div>
</div>
//div[.='Category1']/following-sibling::div[1]//span
There can be many possible structure, the key part in the xpath is //div[.='Category1']/following-sibling::div[1]
Upvotes: 1
Reputation: 57125
I suggest using BeautifulSoup. Find the Category1 tag, then its next_sibling
:
import bs4
your_html = browser.page_source
soup = bs4.BeautifulSoup(your_html, 'lxml')
class1tag = soup.find('div', text='Category1')
tag = class1tag.next_sibling.next_sibling
print(tag)
#<div class="Class2">"Text1 I want"</div>
print(tag.text)
#"Text1 I want"
Upvotes: 0