zorglub76
zorglub76

Reputation: 4942

Getting text from a node

I have a piece of HTML like this:

<a href="/something">
     Title
    <span>Author</span>
</a>

I got a WebElement that matches this HTML. How can I extract only "Title" from it? Method .getText() returns "Title\nAuthor"...

Upvotes: 7

Views: 3803

Answers (5)

GirishB
GirishB

Reputation: 544

Verify the element present for "//a[normalize-space(text())=Title]". It will return true if the text present inside 'a' tag is 'Title'.

Upvotes: 0

Bhuvanesh Mani
Bhuvanesh Mani

Reputation: 1464

you can use jsexecutor to iterate the child nodes, trap the textNode 'Title' and then return its content like below

WebElement link = driver.findElement(By.xpath("//a[@href='something']"));
JavascriptExecutor js = ((JavascriptExecutor)driver);
String authorText = (String) js.executeScript("for(var i = 0; i < arguments[0].childNodes.length; i++) { 
 if(arguments[0].childNodes[i].nodeName == \"#text\") { return arguments[0].childNodes[i].textContent; } }", link);

The javascript code block above iterates both textNode ('Title') and SPAN ('Author') but returns only the text content of textNode.

Note: Previous to this, I have tried including text node in xpath like below, but webdriver throws invalidselector exception as it requires element not textnode

WebElement link = driver.findElement(By.xpath("//a[@href='something']/text()"));

Upvotes: 0

Vftdan
Vftdan

Reputation: 1

If using Python:

[x['textContent'].strip() for x in element.get_property('childNodes') if isinstance(x, dict)]

Where element is your element.

This will return ['Title', ''] (because there are spaces after span).

Upvotes: 0

supputuri
supputuri

Reputation: 14145

Here is the method developed in python.

def get_text_exclude_children(element):
    return driver.execute_script(
        """
        var parent = arguments[0];
        var child = parent.firstChild;
        var textValue = "";
        while(child) {
            if (child.nodeType === Node.TEXT_NODE)
                textValue += child.textContent;
                child = child.nextSibling;
        }
        return textValue;""",
        element).strip()

How to use in this:

liElement = driver.find_element_by_xpath("//a[@href='your_href_goes_here']")
liOnlyText = get_text_exclude_children(liElement)
print(liOnlyText)

Please use your possible strategy to get the element, this method need an element from which you need the text (without children text).

Upvotes: 0

Ross Patterson
Ross Patterson

Reputation: 9569

You can't do this in the WebDriver API, you have to do it in your code. For example:

var textOfA = theAElement.getText();
var textOfSpan = theSpanElement.getText();
var text = textOfA.substr(0, textOfA.length - textOfSpan.length).trim('\n');

Note that the trailing newline is actually part of the text of the <a> element, so if you don't want it, you need to strip it.

Upvotes: 7

Related Questions