Reputation: 532
No doubt that this is extremely basic, but it just won't "click" for me, despite the research that I've done so far. Given the following two HTML examples:
Example 1
<div _ngcontent-c35="" class="row facet-container ng-star-inserted">
<div _ngcontent-c35="" class="searchresult-header">
Locatie
</div>
</div>
Example 2
<div _ngcontent-c42="" class="row facet-panel ng-star-inserted">
<div _ngcontent-c42="" class="facet-panel-header brand-pointer" data-target="#ft5" data-toggle="collapse">
<span _ngcontent-c42="" class="icon-plus ng-star-inserted" data-target="#ft5" data-toggle="collapse">
</span>
Locatie
</div>
<div _ngcontent-c42="" class="collapse" id="ft5">
</div>
</div>
Now I have the following piece of xpath:
//div[.//div[normalize-space(text())='Locatie']]
According to other questions and websites about xpath, text() selects text nodes directly descending from the node we're searching on. Therefore, in example #1, I expect to retrieve the first child "div" element. This happens correctly: no issues there.
I expect the same result in example #2. However, this is not the case: apparently the "span" element disrupts this specific search. When I manually remove it, I succesfully retrieve the required "div" element. Why is the search disrupted? The text should still be a direct child of the div element, no matter if the span element is there or not.
TLDR: Why does the "span" element prevent me from finding the second "div" element in example #2?
Upvotes: 3
Views: 2246
Reputation: 1882
As Jason had answered this is because the signature of normalize-space()
function, from the specs:
Function: string normalize-space(string?)
In XPath 1.0, whenever a string argument is needed, the language applies a type conversion by means of the string()
function. From the specs:
A node-set is converted to a string by returning the string-value of the node in the node-set that is first in document order. If the node-set is empty, an empty string is returned.
So, the resulting node-set from the text()
node test is reduced to the first node in document order and then that node is converted to its string-value.
In this regards is when the always oversees whitespace only text nodes come to notice: your div
element has two text nodes:
<div>
<div>
<!-- HERE ENDS THE FIRST --><span>
</span>
Locatie
<!-- HERE ENDS THE SECOND --></div>
<div>
</div>
</div>
Whenever you have mixed content markup, it's better to use the string-value rather than the text nodes. Otherwise you should use this expression:
//div[.//div/text()[normalize-space()='Locatie']]
Upvotes: 3
Reputation: 4869
I guess that's because normalize-space(text())='Locatie']
intend to check the first child text node (which is actually just an empty string) while you need to check the second one:
//div[.//div[normalize-space(text()[2])='Locatie']]
If you need generic XPath that will work for both cases try
//div[normalize-space(div)='Locatie']
Upvotes: 2
Reputation: 24930
It may have something to do do with the white text/spaces (it's way over my pay grade...), because with this change of focus, the following expression seems to work with most (not all) xpath testers:
.//div[text()[contains(.,'Locat')]]
Upvotes: 0