Michael Mü
Michael Mü

Reputation: 81

Python: Selenium Driver find_elements_by_xpath: Issue

I want to extract Elements from various webpages by using the selenium driver package. I identify target elements by their texts, using find_elements_by_xpath. Although I thought I was able to solve issues with "whitespaces","breaks" etc., the following element is NOT found by my code, unfortunately.

This is the element that I am trying to find by using its text:

x = """<p align="left"><font face="Arial" color="#439539" size="5">Compensation
Discussion<br>&amp; Analysis</font></p>"""

This is a screenshot of the original code of the respective webpage.

enter image description here

This is the Code that I am currently using to identify elements that contain the text "Compensation Discussion & Analysis":

searchterm = "Compensation Discussion & Analysis

driver.find_elements_by_xpath("//*[contains(normalize-space(translate(., '\u00A0', ' ')), '" + searchterm + "')]")

I know that there might be ways to only include parts of my search-term, such as starts-with() and alike. However, I would highly prefer to maintain looking for the entire search-term without splitting it into its components.

Any help is highly appreciated! Thanks a lot in advance!

Upvotes: 1

Views: 259

Answers (2)

RichEdwards
RichEdwards

Reputation: 3753

What you have looks good and I would expect normalize-space() to work - however, clearly that <br> in the middle is an interesting one.

What i can tell you is that the br is causing the text to be split into 2 nodes. You actually have text() and text()[2].

I've only tried this in chrome, I've not attempted it in selenium yet but try this xpath:

//font[contains(normalize-space(concat(text(), ' ', text()[2])),'Compensation Discussion & Analysis')]

(note that i matched this to font but you can update as needed)

This matches your troublesome object and others by full text - which i think is what you're after.

This is how my devtools looks: devtools

What could also be useful is you can also add additional items to the concat, even if they don't exist, and still retain the matches:

//font[contains(normalize-space(concat(text(), ' ', text()[2], ' ', text[3])),'Compensation Discussion & Analysis')]

match with more

That might mean one identifier to match them all..


Final comment - You can see in the middle i join the two nodes WITH A SPACE concat(text(), ' ', text()[2]) - this is because the text of the nodes is Compensation Discussion↵& Analysis - there is no space between "Discussion" and "&" - adding this space increases consistency with the rest of the document.

nodetext


[udpate]

After all the above (which works!) I thought about that "final comment" again....

I looked again and normalize-space does work - your text just doesn't have a space before the ampersand...

see here

Upvotes: 2

M.S.Z
M.S.Z

Reputation: 62

Try this if you are looking for the entire search term on the page:

string=driver.find_element_by_xpath("//div[19]/table[1]/tbody[1]/tr[20]/td[1]/font[1]")
print(string.text)
OR
print(string.get_attribute("innerHTML")

This should do the job!

Upvotes: -1

Related Questions