Python: Selenium Driver find_elements_by_xpath: Issue

Question

I want to extract Elements from various webpages by using the selenium driver package. I identify target elements by their texts, using find_elements_by_xpath. Although I thought I was able to solve issues with "whitespaces","breaks" etc., the following element is NOT found by my code, unfortunately.

This is the element that I am trying to find by using its text:

x = """Compensation
Discussion
& Analysis"""

This is a screenshot of the original code of the respective webpage.

This is the Code that I am currently using to identify elements that contain the text "Compensation Discussion & Analysis":

searchterm = "Compensation Discussion & Analysis

driver.find_elements_by_xpath("//*[contains(normalize-space(translate(., '\u00A0', ' ')), '" + searchterm + "')]")

I know that there might be ways to only include parts of my search-term, such as starts-with() and alike. However, I would highly prefer to maintain looking for the entire search-term without splitting it into its components.

Any help is highly appreciated! Thanks a lot in advance!

RichEdwards · Accepted Answer

What you have looks good and I would expect normalize-space() to work - however, clearly that in the middle is an interesting one.

What i can tell you is that the br is causing the text to be split into 2 nodes. You actually have text() and text()[2].

I've only tried this in chrome, I've not attempted it in selenium yet but try this xpath:

//font[contains(normalize-space(concat(text(), ' ', text()[2])),'Compensation Discussion & Analysis')]

(note that i matched this to font but you can update as needed)

This matches your troublesome object and others by full text - which i think is what you're after.

This is how my devtools looks:

What could also be useful is you can also add additional items to the concat, even if they don't exist, and still retain the matches:

//font[contains(normalize-space(concat(text(), ' ', text()[2], ' ', text[3])),'Compensation Discussion & Analysis')]

That might mean one identifier to match them all..

Final comment - You can see in the middle i join the two nodes WITH A SPACE concat(text(), ' ', text()[2]) - this is because the text of the nodes is Compensation Discussion↵& Analysis - there is no space between "Discussion" and "&" - adding this space increases consistency with the rest of the document.

[udpate]

After all the above (which works!) I thought about that "final comment" again....

I looked again and normalize-space does work - your text just doesn't have a space before the ampersand...

Python: Selenium Driver find_elements_by_xpath: Issue

Answers (2)

Related Questions