Gaurav Shah
Gaurav Shah

Reputation: 5279

nokogiri fails in text contains

I am trying to extract text out of a html.

doc = Nokogiri::HTML('<B> <A href="http://www.asl.com/foo/bar"> Status :</A></B> REGISTERED <BR>')

puts doc.search('//b').first.text
puts doc.search('//b[contains(text(),"Status")]/following-sibling::text()[1]').first.text

the first puts returns Status : But the second puts throws an exception undefined method 'text' for nil:NilClass

Why the contains doesn't search properly ? or am I doing something wrong ?

Upvotes: 0

Views: 1007

Answers (2)

Daniel Rikowski
Daniel Rikowski

Reputation: 72504

I think you have the wrong idea of the text function in XPath. Unlike the DOM function it does not return a concatenated string of all text sub-nodes. Instead it selects individual text nodes.

In your example //text() would select three text nodes:

 [" ", " Status :", " REGISTERED "]

What you might want is this XPath expression:

//b/a[contains(text(),"Status")]/../following-sibling::text()[1]

Essentially it finds the a element having the correct text node, than walks up to the parent element (b) and then gets its sibling text node.

Upvotes: 1

d11wtq
d11wtq

Reputation: 35298

The "Status: " isn't actually a text node inside <B></B>, it's a text node inside <A></A>.

doc.search('//b/a[contains(text(),"Status")]/text()[1]').first.text

Works for me.

Upvotes: 1

Related Questions