How to get node text without children?

Question

I use Nokogiri for parse the html page with same content:


  Useful text
  

  Useless text

When I call the method page.css('p.parent').text Nokogiri returns 'Useful text Useless text'. But I need only 'Useful text'.

matt · Accepted Answer

XPath includes the text() node test for selecting text nodes, so you could do:

page.xpath('//p[@class="parent"]/text()')

Using XPath to select HTML classes can become quite tricky if the element in question could belong to more than one class, so this might not be ideal.

Fortunately Nokogiri adds the text() selector to CSS, so you can use:

page.css('p.parent > text()')

to get the text nodes that are direct children of p.parent. This will also return some nodes that are whtespace only, so you may have to filter them out.

How to get node text without children?

Answers (2)

Related Questions