Reputation: 24930
I'm somewhat (or very) confused about the following:
from selenium.webdriver import Chrome
driver = Chrome()
html_content = """
<html>
<head></head>
<body>
<div class='first'>
Text 1
</div>
<div class="second">
Text 2
<span class='third'> Text 3
</span>
</div>
<div class='first'>
Text 4
</div>
<my_tag class="second">
Text 5
<span class='third'> Text 6
</span>
</my_tag>
</body>
</html>
"""
driver.get("data:text/html;charset=utf-8,{html_content}".format(html_content=html_content))
What I'm trying to do, is find each span
element using xpath, print out its text and then print out the text of the parent of that element. The final output should be something like:
Text 3
Text 2
Text 6
Text 5
I can get the text of span
like this:
el = driver.find_elements_by_xpath("*//span")
for i in el:
print(i.text)
With the output being:
Text 3
Text 6
But when I try to get the parent's (and only the parent's) text by using:
elp = driver.find_elements_by_xpath("*//span/..")
for i in elp:
print(i.text)
The output is:
Text 2 Text 3
Text 5 Text 6
The xpath expressions *//span/..
and //span/../text()
usually (but not always, depending on which xpath test site is being used) evaluate to:
Text 2
Text 5
which is what I need for my for
loop.
Hence the confusion. So I guess what I'm looking for is a for
loop which, in pseudo code, looks like:
el = driver.find_elements_by_xpath("*//span")
for i in el:
print(i.text)
print(i.parent.text) #trying this in real life raises an error....
Upvotes: 0
Views: 1497
Reputation: 24930
I know I already accepted @JeffC's answer, but in the course of working on this question something occurred to me. It's very likely an overkill, but it's an interesting approach and, for the sake of future generations, I figured I might as well post it here as well.
The idea involves using BeautifulSoup. The reason is that BS has a couple of methods for erasing nodes from the tree. One of them which can be useful here (and for which, to my knowledge, Selenium doesn't have an equivalent method) is decompose()
(see more here). We can use decompose()
to suppress the printing of the second part of the text
of the parent, which is contained inside a span
tag by eliminating the tag and its content. So we import BS and start with @JeffC's answer:
from bs4 import BeautifulSoup
elp = driver.find_elements_by_css_selector("span.third")
for i in elp:
print(i.text)
s = i.find_element_by_xpath("./..").get_attribute("innerHTML")
and here switch to bs4
content = BeautifulSoup(s, 'html.parser')
content.find('span').decompose()
print(content.text)
And the output, without string manipulation, regex, or whatnot is...:
Text 3
Text 2
Text 6
Text 5
Upvotes: 2
Reputation: 14135
Here is the python method that will retrieve the text from only parent node.
def get_text_exclude_children(element):
return driver.execute_script(
"""
var parent = arguments[0];
var child = parent.firstChild;
var textValue = "";
while(child) {
if (child.nodeType === Node.TEXT_NODE)
textValue += child.textContent;
child = child.nextSibling;
}
return textValue;""",
element).strip()
This is how to use the method in your case:
elements = driver.find_elements_by_css_selector("span.third")
for eleNum in range(len(elements)):
print(driver.find_element_by_xpath("(//span[@class='third'])[" + str(eleNum+1) +"]").text)
print(get_text_exclude_children(driver.find_element_by_xpath("(//span[@class='third'])[" + str(eleNum+1) +"]/parent::*")))
Upvotes: 1
Reputation: 25596
There's probably a few ways to do this. Here's one way
elp = driver.find_elements_by_css_selector("span.third")
for i in elp:
print(i.text)
s = i.find_element_by_xpath("./..").get_attribute("innerHTML")
print(s.split('<')[0].strip())
I used a simple CSS selector to find the child elements ("text 3" and "text 6"). I loop through those elements and print their .text
as well as navigate up one level to find the parent and print its text also. As OP noted, printing the parent text also prints the child. To get around this, we need to get the innerHTML, split it and strip out the spaces.
To explain the XPath in more detail
./..
^ start at an existing node, the 'i' in 'i.find_element_*'. If you skip/remove this '.', you will start at the top of the DOM instead of at the child element you've already located.
^ go up one level, to find the parent
Upvotes: 2
Reputation: 3927
i.parent.text will not work, in java i used to write some thing like
ele.get(i).findElement("here path to parent may be parent::div ").getText();
Upvotes: 1