nokogiri fails in text contains

Question

I am trying to extract text out of a html.

doc = Nokogiri::HTML(' Status : REGISTERED ')

puts doc.search('//b').first.text
puts doc.search('//b[contains(text(),"Status")]/following-sibling::text()[1]').first.text

the first puts returns Status : But the second puts throws an exception undefined method 'text' for nil:NilClass

Why the contains doesn't search properly ? or am I doing something wrong ?

Daniel Rikowski · Accepted Answer

I think you have the wrong idea of the text function in XPath. Unlike the DOM function it does not return a concatenated string of all text sub-nodes. Instead it selects individual text nodes.

In your example //text() would select three text nodes:

 [" ", " Status :", " REGISTERED "]

What you might want is this XPath expression:

//b/a[contains(text(),"Status")]/../following-sibling::text()[1]

Essentially it finds the a element having the correct text node, than walks up to the parent element (b) and then gets its sibling text node.

nokogiri fails in text contains

Answers (2)

Related Questions