Reputation: 5279
I am trying to extract text out of a html.
doc = Nokogiri::HTML('<B> <A href="http://www.asl.com/foo/bar"> Status :</A></B> REGISTERED <BR>')
puts doc.search('//b').first.text
puts doc.search('//b[contains(text(),"Status")]/following-sibling::text()[1]').first.text
the first puts returns Status :
But the second puts throws an exception undefined method 'text' for nil:NilClass
Why the contains
doesn't search properly ?
or am I doing something wrong ?
Upvotes: 0
Views: 1007
Reputation: 72504
I think you have the wrong idea of the text
function in XPath. Unlike the DOM function it does not return a concatenated string of all text sub-nodes. Instead it selects individual text nodes.
In your example //text()
would select three text nodes:
[" ", " Status :", " REGISTERED "]
What you might want is this XPath expression:
//b/a[contains(text(),"Status")]/../following-sibling::text()[1]
Essentially it finds the a
element having the correct text node, than walks up to the parent element (b
) and then gets its sibling text node.
Upvotes: 1
Reputation: 35298
The "Status: " isn't actually a text node inside <B></B>
, it's a text node inside <A></A>
.
doc.search('//b/a[contains(text(),"Status")]/text()[1]').first.text
Works for me.
Upvotes: 1