diasks2
diasks2

Reputation: 2142

Nokogiri: Ignoring child nodes

I have an xml document like the following:

<doc>
  <header>
    <group>
      <note>group note</note>
    </group>
    <note>header note</note>
  </header>
</doc>

I want to retrieve the note elements that fall under header and not any note elements that fall under group.

I thought this should work but it also picks up the note under group:

 doc.css('header note')

What is the syntax to only grab the note element that is the direct child of the header?

Upvotes: 0

Views: 247

Answers (2)

the Tin Man
the Tin Man

Reputation: 160631

The simplest thing is to let Nokogiri find all header note tags, then only use the last one:

require 'nokogiri'

doc = Nokogiri::XML(<<EOT)
<doc>
  <header>
    <group>
      <note>group note</note>
    <group>
    <note>header note</note>
  </header>
</doc>
EOT

doc.css('header note').last.text # => "header note"

Remember, css, like its XPath counterpart xpath, and the more generic search, return NodeSets. NodeSets are like an Array in that you can slice it or use first or last with it.

Note though, you could just as easily use:

doc.css('note').last.text # => "header note"

Notice though, your XML is malformed. The <group> tag isn't closed. Nokogiri is doing fixups to the XML, which can give you odd results. Check for that situation by looking at doc.errors:

# => [#<Nokogiri::XML::SyntaxError: Opening and ending tag mismatch: group line 5 and header>,
#     #<Nokogiri::XML::SyntaxError: Opening and ending tag mismatch: group line 3 and doc>,
#     #<Nokogiri::XML::SyntaxError: Premature end of data in tag header line 2>,
#     #<Nokogiri::XML::SyntaxError: Premature end of data in tag doc line 1>]

Upvotes: 0

Justin Ko
Justin Ko

Reputation: 46846

You can use the > in CSS-selectors to find child elements. This is in contrast to using a space, , which finds descendant elements.

In your case:

puts doc.css('header > note')
#=> "<note>header note</note>"

Upvotes: 1

Related Questions