Nakilon
Nakilon

Reputation: 35074

To get text after the tag, containing another text

For example:

<p>
<b>Member Since:</b> Aug. 07, 2010<br><b>Time Played:</b> <span class="text_tooltip" title="Actual Time: 15.09:37:06">16 days</span><br><b>Last Game:</b>
<span class="text_tooltip" title="07/16/2011 23:41">1 minute ago</span>
<br><b>Wins:</b> 1,017<br><b>Losses / Quits:</b> 883 / 247<br><b>Frags / Deaths:</b> 26,955 / 42,553<br><b>Hits / Shots:</b> 690,695 / 4,229,566<br><b>Accuracy:</b> 16%<br>
</p>

I want to get 1,017. It is a text after the tag, containing text Wins:.
If I used regex, it would be [/<b>Wins:<\/b> ([^<]+)/,1], but how to do it with Nokogiri and XPath? Or should I better parse this part of page with regex?

Upvotes: 1

Views: 222

Answers (4)

akuhn
akuhn

Reputation: 27793

Here

doc = Nokogiri::HTML(html)
puts doc.at('b[text()="Wins:"]').next.text

Upvotes: 3

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243449

Use:

//*[. = 'Wins:']/following-sibling::node()[1]

In case this is ambiguous (selects more than one node), more strict expressions can be specified:

//*[. = 'Wins:']/following-sibling::node()[self::text()][1]

Or:

(//*[. = 'Wins:'])[1]/following-sibling::node()[1]

Or:

(//*[. = 'Wins:'])[1]/following-sibling::node()[self::text()][1]

Upvotes: 0

Emiliano Poggi
Emiliano Poggi

Reputation: 24826

I would use pure XPath like:

"//b[.='Wins:']/following::node()[1]"

I've heard thousand of times (and from gurus) "never use regex to parse XML". Can you provide some "shocking" reference demonstrating that this sentence is not valid any more?

Upvotes: 1

Kirill Polishchuk
Kirill Polishchuk

Reputation: 56162

You can use this XPath: //*[*/text() = 'Wins:']/text() It will return 1,017.

About regex: RegEx match open tags except XHTML self-contained tags

Upvotes: 1

Related Questions