Denis Kreshikhin
Denis Kreshikhin

Reputation: 9430

How to get node text without children?

I use Nokogiri for parse the html page with same content:

<p class="parent">
  Useful text
  <br>
  <span class="child">Useless text</span>
</p>

When I call the method page.css('p.parent').text Nokogiri returns 'Useful text Useless text'. But I need only 'Useful text'.

How to get node text without children?

Upvotes: 22

Views: 5374

Answers (2)

matt
matt

Reputation: 79813

XPath includes the text() node test for selecting text nodes, so you could do:

page.xpath('//p[@class="parent"]/text()')

Using XPath to select HTML classes can become quite tricky if the element in question could belong to more than one class, so this might not be ideal.

Fortunately Nokogiri adds the text() selector to CSS, so you can use:

page.css('p.parent > text()')

to get the text nodes that are direct children of p.parent. This will also return some nodes that are whtespace only, so you may have to filter them out.

Upvotes: 36

user2062950
user2062950

Reputation:

You should be able to use page.css('p.parent').children.remove.

Then your page.css('p.parent').text will return the text without the children nodes.

Note: the page will be modified by the remove

Upvotes: -1

Related Questions