Moak
Moak

Reputation: 12885

Xpath: find data not contained in tags

I'm trying to target data that is not located unside a tag (other than the all encompassing p)

<p>
    <strong>id1:</strong>data1<br />
    data2<br />
    <strong>id3:</strong>data3<br />
    <strong>id4:</strong>data4
</p>
<p>
    <strong>id1:</strong>data1<br />
    data2<br />
    <strong>id3:</strong>data3
</p>

Any suggestions how I can get data1, data2, and data3 and be able to identify them uniquely (for example data3 follows the strong[.='id3:'] and ends before the <br/>)

EDIT: data2 always follows data1 after a <br/> Thanks

Upvotes: 1

Views: 587

Answers (3)

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243449

Just use:

p/text()

this selects all text nodes that are children of p elements that are children of the current node.

Or, if you want to exclude white-space-only text-nodes, use:

p/text()[normalize-space()]

If you only want to select the n-th such text node use:

p/text()[normalize-space()][1]
p/text()[normalize-space()][2]

. . . . . . . . . .

up to

p/text()[normalize-space()][$k]

where $k is the total number of such nodes:

count(p/text()[normalize-space()])

Upvotes: 2

Michael Kay
Michael Kay

Reputation: 163322

To find the text node that immediately follows <strong>id1</strong>, use strong[.='id1']/following-sibling::text()[1] (with the p element as your context node).

This assumes that you know there will be such a text node. A more rigorous test is strong[.='id1']/following-sibling::node()[1][self::text()] which will find the first node (of any kind) after the strong element, and return it provided that it turns out to be a text node.

It's not clear how you want to identify data2 in your example.

Upvotes: 3

Guna
Guna

Reputation: 1

It can be extracted with text().

for ex, the below xpath gives u the required result

//p/text()

Upvotes: 0

Related Questions