Reputation: 11
I have not been able to search for a Node that has an HTML entity as a value.
I have this HTML fragment:
require 'nokogiri'
DATA = "<p>A paragraph <ul><li>Item 1</li><li>⊕</li><li>Mango</li></ul></p>"
doc = Nokogiri::HTML(DATA)
p doc.xpath('//li[contains(text(), "Man")]') => This returns a NodeSet
p doc.xpath('//li[contains(text(), "8853")]') => This returns 'Nil'
I am not able to figure out why the second statement returns NIL and how to fix it.
Upvotes: 0
Views: 2953
Reputation: 160551
When Nokogiri parses the document, it tries to decode entities, so, you have to look for the decoded value:
require 'nokogiri'
data = "<p>A paragraph <ul><li>Item 1</li><li>⊕</li><li>Mango</li></ul></p>"
doc = Nokogiri::HTML(data)
p doc.search('li').map(&:text)
=> ["Item 1", "⊕", "Mango"]
Notice that the HTML entity has been decoded to its real character.
puts doc.to_html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<p>A paragraph </p>
<ul>
<li>Item 1</li>
<li>⊕</li>
<li>Mango</li>
</ul>
</body></html>
Again, the entity is decoded.
p doc.xpath('//li[contains(text(), "⊕")]')
=> [#<Nokogiri::XML::Element:0x3fcd78d5f16c name="li" children=[#<Nokogiri::XML::Text:0x3fcd78d5ef64 "⊕">]>]
Upvotes: 7