Jack Fleeting
Jack Fleeting

Reputation: 24930

How to use 'find_elements_by_xpath' inside a for loop

I'm somewhat (or very) confused about the following:

from selenium.webdriver import Chrome
driver = Chrome()

html_content = """
<html>
     <head></head>
     <body>
         <div class='first'>
             Text 1
         </div>
         <div class="second">
             Text 2
                 <span class='third'> Text 3 
                 </span>              
         </div>
         <div class='first'>
             Text 4
         </div>
         <my_tag class="second">
             Text 5
                 <span class='third'> Text 6
                 </span>              
         </my_tag>
     </body>
</html>
"""
driver.get("data:text/html;charset=utf-8,{html_content}".format(html_content=html_content))

What I'm trying to do, is find each span element using xpath, print out its text and then print out the text of the parent of that element. The final output should be something like:

Text 3
Text 2
Text 6
Text 5

I can get the text of span like this:

el = driver.find_elements_by_xpath("*//span")
for i in el:
   print(i.text)

With the output being:

Text 3
Text 6

But when I try to get the parent's (and only the parent's) text by using:

elp = driver.find_elements_by_xpath("*//span/..")
for i in elp:
   print(i.text)

The output is:

Text 2 Text 3
Text 5 Text 6

The xpath expressions *//span/..and //span/../text() usually (but not always, depending on which xpath test site is being used) evaluate to:

Text 2
Text 5

which is what I need for my for loop.

Hence the confusion. So I guess what I'm looking for is a for loop which, in pseudo code, looks like:

 el = driver.find_elements_by_xpath("*//span")
 for i in el:
    print(i.text)
    print(i.parent.text) #trying this in real life raises an error....

Upvotes: 0

Views: 1497

Answers (4)

Jack Fleeting
Jack Fleeting

Reputation: 24930

I know I already accepted @JeffC's answer, but in the course of working on this question something occurred to me. It's very likely an overkill, but it's an interesting approach and, for the sake of future generations, I figured I might as well post it here as well.

The idea involves using BeautifulSoup. The reason is that BS has a couple of methods for erasing nodes from the tree. One of them which can be useful here (and for which, to my knowledge, Selenium doesn't have an equivalent method) is decompose() (see more here). We can use decompose() to suppress the printing of the second part of the text of the parent, which is contained inside a span tag by eliminating the tag and its content. So we import BS and start with @JeffC's answer:

from bs4 import BeautifulSoup
elp = driver.find_elements_by_css_selector("span.third")

for i in elp:
    print(i.text)
    s = i.find_element_by_xpath("./..").get_attribute("innerHTML")

and here switch to bs4

    content = BeautifulSoup(s, 'html.parser')
    content.find('span').decompose()
    print(content.text)

And the output, without string manipulation, regex, or whatnot is...:

Text 3   
      Text 2

Text 6
      Text 5

Upvotes: 2

supputuri
supputuri

Reputation: 14135

Here is the python method that will retrieve the text from only parent node.

def get_text_exclude_children(element):
    return driver.execute_script(
        """
        var parent = arguments[0];
        var child = parent.firstChild;
        var textValue = "";
        while(child) {
            if (child.nodeType === Node.TEXT_NODE)
                    textValue += child.textContent;
                    child = child.nextSibling;
        }
        return textValue;""",
        element).strip()

This is how to use the method in your case:

elements = driver.find_elements_by_css_selector("span.third")
for eleNum in range(len(elements)):
    print(driver.find_element_by_xpath("(//span[@class='third'])[" + str(eleNum+1) +"]").text)
    print(get_text_exclude_children(driver.find_element_by_xpath("(//span[@class='third'])[" + str(eleNum+1) +"]/parent::*")))

Here is the output: enter image description here

Upvotes: 1

JeffC
JeffC

Reputation: 25596

There's probably a few ways to do this. Here's one way

elp = driver.find_elements_by_css_selector("span.third")
for i in elp:
    print(i.text)
    s = i.find_element_by_xpath("./..").get_attribute("innerHTML")
    print(s.split('<')[0].strip())

I used a simple CSS selector to find the child elements ("text 3" and "text 6"). I loop through those elements and print their .text as well as navigate up one level to find the parent and print its text also. As OP noted, printing the parent text also prints the child. To get around this, we need to get the innerHTML, split it and strip out the spaces.

To explain the XPath in more detail

./..
^ start at an existing node, the 'i' in 'i.find_element_*'. If you skip/remove this '.', you will start at the top of the DOM instead of at the child element you've already located.
 ^ go up one level, to find the parent

Upvotes: 2

murali selenium
murali selenium

Reputation: 3927

i.parent.text will not work, in java i used to write some thing like

 ele.get(i).findElement("here path to parent may be parent::div ").getText();

Upvotes: 1

Related Questions