Andreas Andersson
Andreas Andersson

Reputation: 73

Help with xPath query

I'm using an HTML parser library to parse a web page into XML. With the XML I want to select nodes containing text that belong to each other using xPath queries.

Here's an example of the HTML:

<p><span style="font-family: 'Verdana','sans-serif'; font-size: 32pt;"><span style="font-family: 'Verdana','sans-serif'; font-size: 11pt; mso-bidi-font-size: 18.0pt;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="line-height: 115%; font-family: 'Verdana','sans-serif'; font-size: 36pt; mso-fareast-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-fareast-language: EN-US; mso-ansi-language: SV; mso-bidi-language: AR-SA;">&nbsp;</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; VECKA 3</span></span></p><p><span style="font-family: 'Verdana','sans-serif'; font-size: 32pt;"></span><span style="font-family: 'Verdana','sans-serif'; font-size: 11pt; mso-bidi-font-size: 18.0pt;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;17-21 JANUARI</span></p>
<p style="margin-bottom: 0pt;"><span style="font-family: 'Verdana','sans-serif'; font-size: 11pt; mso-bidi-font-size: 18.0pt;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="font-family: 'Verdana','sans-serif'; font-size: 11pt; mso-bidi-font-size: 18.0pt;">11.30-14.30</span></p>
<p style="margin-bottom: 0pt;"><span style="font-family: 'Verdana','sans-serif'; font-size: 10pt; mso-bidi-font-size: 15.0pt;">MÅNDAG:&nbsp;Parmesangratinerad tungafile med paprikasås</span></p>
<p style="margin-bottom: 0pt;"><span style="font-family: 'Verdana','sans-serif'; font-size: 10pt; mso-bidi-font-size: 15.0pt;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Biffgryta med syltlök &amp; ris</span></p>

Using xPath on the parsed piece of HTML, I want to select the <span>-node containing the word MÅNDAG, but also the following <span>-node which belongs to it. So for example I want to select the nodes that contain the text: "MÅNDAG: Parmesangratinerad tungafile med paprikasås" and the text "Biffgryta med syltlök & ris".

I think that I want to use an xPath that looks something like this:

"//span[contains(.,'MÅNDAG') or (contains(.,'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;') and ../parent-sibling::/span[contains(.,'MÅNDAG')]]"

Any ideas?

Upvotes: 0

Views: 1267

Answers (2)

user357812
user357812

Reputation:

I want to select the <span>-node containing the word MÅNDAG, but also the following <span>-node which belongs to it

An XPath 1.0 expression without node set union:

//span[(.|preceding::span[1])[contains(.,'MÅNDAG')]]

Upvotes: 0

Michael Kay
Michael Kay

Reputation: 163635

In XPath 2.0:

//span[contains(.,'MÅNDAG')/(. | following::span[1])

In XPath 1.0:

//span[contains(.,'MÅNDAG') | //span[contains(.,'MÅNDAG')/following::span[1]

Upvotes: 0

Related Questions