Reputation: 4916
I am using Nokogiri with Ruby to interpret the contents of an XML file. I would like to get an array (or similar) of all elements that are direct children of <where>
in my example. However, I am getting various text nodes (e.g. "\n\t\t\t"
), which I do not want. Is there any way I can remove or ignore them?
@body = "
<xml>
<request>
<where>
<username compare='e'>Admin</username>
<rank compare='gt'>5</rank>
</where>
</request>
</xml>" #in my code, the XML contains tab-indentation, rather than spaces. It is edited here for display purposes.
@noko = Nokogiri::XML(@body)
xml_request = @noko.xpath("//xml/request")
where = xml_request.xpath("where")
c = where.children
p c
The above Ruby script outputs:
[#<Nokogiri::XML::Text:0x100344c "\n\t\t\t">, #<Nokogiri::XML::Element:0x1003350 name="username" attributes=[#<Nokogiri::XML::Attr:0x10032fc name="compare" value="e">] children=[#<Nokogiri::XML::Text:0x1007580 "Admin">]>, #<Nokogiri::XML::Text:0x100734c "\n\t\t\t">, #<Nokogiri::XML::Element:0x100722c name="rank" attributes=[#<Nokogiri::XML::Attr:0x10071d8 name="compare" value="gt">] children=[#<Nokogiri::XML::Text:0x1006cec "5">]>, #<Nokogiri::XML::Text:0x10068a8 "\n\t\t">]
I would like to somehow obtain the following object:
[#<Nokogiri::XML::Element:0x1003350 name="username" attributes=[#<Nokogiri::XML::Attr:0x10032fc name="compare" value="e">] children=[#<Nokogiri::XML::Text:0x1007580 "Admin">]>, #Nokogiri::XML::Element:0x100722c name="rank" attributes=[#<Nokogiri::XML::Attr:0x10071d8 name="compare" value="gt">] children=[#<Nokogiri::XML::Text:0x1006cec "5">]>]
Currently I can work around the issue using
c.each{|child|
if !child.text?
...
end
}
but c.length == 5
. It would make my life easier if someone can suggest how to exclude direct child text nodes from c, so that c.length == 2
Upvotes: 10
Views: 14321
Reputation: 303244
You have (at least) three options from which to choose:
Use c = where.element_children
instead of c = where.children
.
Select only the child elements directly:
c = xml_request.xpath('./where/*')
or
c = where.xpath('./*')
Filter the list of children to only those that are elements:
c = where.children.select(&:element?)
Upvotes: 19