Reputation: 9430
I use Nokogiri for parse the html page with same content:
<p class="parent">
Useful text
<br>
<span class="child">Useless text</span>
</p>
When I call the method page.css('p.parent').text
Nokogiri returns 'Useful text Useless text'. But I need only 'Useful text'.
How to get node text without children?
Upvotes: 22
Views: 5374
Reputation: 79813
XPath includes the text()
node test for selecting text nodes, so you could do:
page.xpath('//p[@class="parent"]/text()')
Using XPath to select HTML classes can become quite tricky if the element in question could belong to more than one class, so this might not be ideal.
Fortunately Nokogiri adds the text()
selector to CSS, so you can use:
page.css('p.parent > text()')
to get the text nodes that are direct children of p.parent
. This will also return some nodes that are whtespace only, so you may have to filter them out.
Upvotes: 36
Reputation:
You should be able to use page.css('p.parent').children.remove
.
Then your page.css('p.parent').text
will return the text without the children nodes.
Note: the page
will be modified by the remove
Upvotes: -1