KunLun
KunLun

Reputation: 3225

How to get partial text of an element using Selenium

I have this HTML:

<div id="msg">

  <b>text1</b>
  <br>
  text2 <b>text3</b> text4

  <ul class="list">
    <li>...</li>
    <li>...</li>
    <li>...</li>
  </ul>

  text5

</div>

I want to extract from div[@id = 'msg'] the text before ul, using xpath.

Like driver.findElement(By.xpath("xpath")).getText() -> text1 text2 text3 text4

It is possible or I should user another logic?

Upvotes: 0

Views: 2374

Answers (2)

supputuri
supputuri

Reputation: 14135

Just want to share another idea.

You can get the OuterHTML and then strip it till "ul" tag and then remove the html tags from the output. Now you can change the string as per your need.

I am almost able to get the text you are looking for, using javascript. Pasted it below for your reference, you can do the same in Java.

oHTML = document.querySelector("div#msg").outerHTML
oHTML.substring(0,oHTML.search('<ul')).replace(/<.*>/,'').replace(/<\/?[^>]+(>|$)/g, "").replace(/\n/g, " ").trim()

you can run this in the browser console to see the output. Below is the javascript output.

text1      text2 text3 text4

Upvotes: 0

undetected Selenium
undetected Selenium

Reputation: 193058

As per @kjhughes in this discussion, XPath is for selection, not manipulation. You can select nodes as they exist in an XML document, but you cannot transform those nodes.

In your case, if your XML document includes this node:

<div id="msg">
  <b>text1</b>
  <br>
  text2 <b>text3</b> text4
  <ul class="list">
    <li>...</li>
    <li>...</li>
    <li>...</li>
  </ul>
  text5
</div>

You can select the <div> node through //div[@id='msg'], but the selected node will appear as it appears in the source XML, that is, with the child with class as list within the <ul> node.

If you want to manipulate or transform a node selected via XPath (to exclude its children elements) you'll have to use the hosting language (XSLT, JavaScript, Python, Java, C#, etc) to manipulate the selection.


Solution

To extract the texts individually you can use the following solution:

WebElement myElement = driver.findElement(By.xpath("//div[@id='msg']"));
String text1 = myElement.findElement(By.xpath("./b")).getAttribute("innerHTML");
String text2 = ((JavascriptExecutor)driver).executeScript('return arguments[0].childNodes[3].textContent;', myElement).toString();
String text3 = ((JavascriptExecutor)driver).executeScript('return arguments[0].childNodes[4].textContent;', myElement).toString();
String text4 = ((JavascriptExecutor)driver).executeScript('return arguments[0].childNodes[5].textContent;', myElement).toString();
String text5 = ((JavascriptExecutor)driver).executeScript('return arguments[0].lastChild.textContent;', myElement).toString();

Upvotes: 1

Related Questions