Clms
Clms

Reputation: 733

Xpath: Get Text After Element With Containing Text

I am looking for a way to get text which is not inside an HTML element:

<div class="col-sm-4">
  <strong>Handelnde Personen:</strong><br><br>
  <strong>Geschäftsführer</strong><br>
  Mr John Doe<br>
  Privatperson<br>
  .....<br>
  <br>

I want to get "Mr John Doe".

The only way I see is looking for a strong element which contains "Geschäftsführer" and then look for the following text.

My idea so far:

//strong[contains(text(), 'Gesch')]/br/../text()

... I simply can't make it work.

Also, is there a "wildcard" for strings? That I could use

*esch*ftsf*hr*

for "Geschäftsführer"?

I highly appreciate your help, thanks!

Upvotes: 0

Views: 267

Answers (1)

Michael Kay
Michael Kay

Reputation: 163585

Try

//strong[starts-with(., 'Gesch')]/following-sibling::text()[1]

As for wildcard matching, with XPath 2.0 you use regular expressions:

//strong[matches(., '.*esch.*ftsf.*hr.*')]

With XPath 3.0 you could also use the Unicode collation algorithm

//strong[compare(., 'Geschäftsführer', 
  'http://www.w3.org/2013/collation/UCA?strength=primary') = 0]

(strength=primary ignores case and accents)

But to get anything more advanced than XPath 1.0 in the browser, you would need to deploy Saxon-JS.

Another option with 1.0 is to use translate() to remove case and umlauts:

//strong[translate(., 'ABCD..XYZÄÖÜäöüß', 'abcd..xyzaouaous') = 'geschaftsfuhrer']

Note, in all these examples I have used "." rather than "text()" to get the string value of an element - this is recommended practice.

Upvotes: 1

Related Questions