SimonMayer
SimonMayer

Reputation: 4916

Get children of an element without the text nodes

I am using Nokogiri with Ruby to interpret the contents of an XML file. I would like to get an array (or similar) of all elements that are direct children of <where> in my example. However, I am getting various text nodes (e.g. "\n\t\t\t"), which I do not want. Is there any way I can remove or ignore them?

@body = "
<xml>
  <request>
    <where>
      <username compare='e'>Admin</username>
      <rank compare='gt'>5</rank>
    </where>
  </request>
</xml>" #in my code, the XML contains tab-indentation, rather than spaces. It is edited here for display purposes.

@noko = Nokogiri::XML(@body)
xml_request = @noko.xpath("//xml/request")
where = xml_request.xpath("where")
c = where.children
p c

The above Ruby script outputs:

[#<Nokogiri::XML::Text:0x100344c "\n\t\t\t">, #<Nokogiri::XML::Element:0x1003350 name="username" attributes=[#<Nokogiri::XML::Attr:0x10032fc name="compare" value="e">] children=[#<Nokogiri::XML::Text:0x1007580 "Admin">]>, #<Nokogiri::XML::Text:0x100734c "\n\t\t\t">, #<Nokogiri::XML::Element:0x100722c name="rank" attributes=[#<Nokogiri::XML::Attr:0x10071d8 name="compare" value="gt">] children=[#<Nokogiri::XML::Text:0x1006cec "5">]>, #<Nokogiri::XML::Text:0x10068a8 "\n\t\t">]

I would like to somehow obtain the following object:

[#<Nokogiri::XML::Element:0x1003350 name="username" attributes=[#<Nokogiri::XML::Attr:0x10032fc name="compare" value="e">] children=[#<Nokogiri::XML::Text:0x1007580 "Admin">]>, #Nokogiri::XML::Element:0x100722c name="rank" attributes=[#<Nokogiri::XML::Attr:0x10071d8 name="compare" value="gt">] children=[#<Nokogiri::XML::Text:0x1006cec "5">]>]

Currently I can work around the issue using

c.each{|child|
  if !child.text?
    ...
  end
}

but c.length == 5. It would make my life easier if someone can suggest how to exclude direct child text nodes from c, so that c.length == 2

Upvotes: 10

Views: 14321

Answers (1)

Phrogz
Phrogz

Reputation: 303244

You have (at least) three options from which to choose:

  1. Use c = where.element_children instead of c = where.children.

  2. Select only the child elements directly:
    c = xml_request.xpath('./where/*') or
    c = where.xpath('./*')

  3. Filter the list of children to only those that are elements:
    c = where.children.select(&:element?)

Upvotes: 19

Related Questions