Reputation: 2605
I have an element with three occurences on the page. If i match it with Xpath expression //div[@class='col-md-9 col-xs-12']
, i get all three occurences as expected.
Now i try to rework the matching element on the fly with
substring-before(//div[@class='col-md-9 col-xs-12'], 'Bewertungen')
, to get the string before the word "Bewertungen",
normalize-space(//div[@class='col-md-9 col-xs-12'])
, to clean up redundant whitespaces,
normalize-space(substring-before(//div[@class='col-md-9 col-xs-12']
- both actions.
The problem with last three expressions is, that they extract only the first occurence of the element. It makes no difference, whether i add /text()
after matching definition.
I don't understand, how an addition of normalize-space
and/or substring-before
influences the "main" expression in the way it stops to recognize multiple occurences of targeted element and gets only the first. Without an addition it matches everything as it should.
How is it possible to adjust the Xpath expression nr. 3 to get all occurences of an element?
Example url is https://www.provenexpert.com/de-de/jazzyshirt/
Upvotes: 0
Views: 688
Reputation: 163595
Note that in XPath 1.0, functions like substring-after()
, if given a set of three nodes as input, ignore all nodes except the first. XPath 2.0 changes this: it gives you an error.
In XPath 3.1 you can apply a function to each of the nodes using the apply operator, "!": //div[condition] ! substring-before(normalize-space(), 'Bewertung')
. That returns a sequence of 3 strings. There's no equivalent in XPath 1.0, because there's no data type in XPath 1.0 that can represent a sequence of strings.
In XPath 2.0 you can often achieve the same effect using "/" instead of "!", but it has restrictions.
When asking questions on StackOverflow, please always mention which version of XPath you are using. We tend to assume that if people don't say, they're probably using 1.0, because 1.0 products don't generally advertise their version number.
Upvotes: 1
Reputation: 24930
The problem is that both normalize-space()
and substring-before()
have a required cardinality of 1, meaning can only accept one occurrence of the element you are trying to normalize or find a substring of. Each of your expressions results in 3 sequences which these two functions cannot process. (I probably didn't express the problem properly, but I think this is the general idea).
In light of that, try:
//div[@class='col-md-9 col-xs-12']/substring-before(normalize-space(.), 'Bewertung')
Upvotes: 1