Tybs
Tybs

Reputation: 532

Why does normalize-space(text()) not work with a preceding child element?

No doubt that this is extremely basic, but it just won't "click" for me, despite the research that I've done so far. Given the following two HTML examples:

Example 1

<div _ngcontent-c35="" class="row facet-container ng-star-inserted">
    <div _ngcontent-c35="" class="searchresult-header">
        Locatie
    </div>
</div>

Example 2

<div _ngcontent-c42="" class="row facet-panel ng-star-inserted">
    <div _ngcontent-c42="" class="facet-panel-header brand-pointer" data-target="#ft5" data-toggle="collapse">
        <span _ngcontent-c42="" class="icon-plus ng-star-inserted" data-target="#ft5" data-toggle="collapse">
        </span> 
        Locatie
    </div>
    <div _ngcontent-c42="" class="collapse" id="ft5">
    </div>
</div>

Now I have the following piece of xpath:

//div[.//div[normalize-space(text())='Locatie']]

According to other questions and websites about xpath, text() selects text nodes directly descending from the node we're searching on. Therefore, in example #1, I expect to retrieve the first child "div" element. This happens correctly: no issues there.

I expect the same result in example #2. However, this is not the case: apparently the "span" element disrupts this specific search. When I manually remove it, I succesfully retrieve the required "div" element. Why is the search disrupted? The text should still be a direct child of the div element, no matter if the span element is there or not.

TLDR: Why does the "span" element prevent me from finding the second "div" element in example #2?

Upvotes: 3

Views: 2246

Answers (3)

Alejandro
Alejandro

Reputation: 1882

As Jason had answered this is because the signature of normalize-space() function, from the specs:

Function: string normalize-space(string?)

In XPath 1.0, whenever a string argument is needed, the language applies a type conversion by means of the string() function. From the specs:

A node-set is converted to a string by returning the string-value of the node in the node-set that is first in document order. If the node-set is empty, an empty string is returned.

So, the resulting node-set from the text() node test is reduced to the first node in document order and then that node is converted to its string-value.

In this regards is when the always oversees whitespace only text nodes come to notice: your div element has two text nodes:

<div>
    <div>
        <!-- HERE ENDS THE FIRST --><span>
        </span> 
        Locatie
    <!-- HERE ENDS THE SECOND --></div>
    <div>
    </div>
</div>

Whenever you have mixed content markup, it's better to use the string-value rather than the text nodes. Otherwise you should use this expression:

//div[.//div/text()[normalize-space()='Locatie']]

Upvotes: 3

JaSON
JaSON

Reputation: 4869

I guess that's because normalize-space(text())='Locatie'] intend to check the first child text node (which is actually just an empty string) while you need to check the second one:

//div[.//div[normalize-space(text()[2])='Locatie']]

If you need generic XPath that will work for both cases try

//div[normalize-space(div)='Locatie']

Upvotes: 2

Jack Fleeting
Jack Fleeting

Reputation: 24930

It may have something to do do with the white text/spaces (it's way over my pay grade...), because with this change of focus, the following expression seems to work with most (not all) xpath testers:

.//div[text()[contains(.,'Locat')]]

Upvotes: 0

Related Questions