sayth
sayth

Reputation: 7038

How to get descendant nodes from XML based on an attribute

I'm trying to get descendant children of a node:

require 'nokogiri'

@doc = Nokogiri::XML(File.open('data/20160521RHIL0.xml'))
nom_id = @doc.xpath('//race/nomination/@id')

race_id.each do |x|
  puts race_id.traverse {|race_id| puts nom_id }
end

I'm looking at two sources of info:

  1. The documentation for XML:Node, which has

    Nokogiri::XML::Node#children
    
  2. sparklemotion's Cheat-sheet:

    node.traverse {|node| } # yields all children and self to a block, _recursivel
    

This is my test XML:

<meeting id="42977">
  <race id="215411">
    <nomination number="8" saddlecloth="8" horse="Chipanda" id="198926" />
    <nomination number="2" saddlecloth="2" horse="Chifries" id="198965" />
    <nomination number="1" saddlecloth="1" horse="Itpanda" id="199260" />
  </race>
  <race id="215412">
    <nomination number="1" saddlecloth="1" horse="Ruby" id="199634" />
    <nomination number="2" saddlecloth="2" horse="Gems" id="208926" />
    <nomination number="3" saddlecloth="3" horse="Rock" id="122923" />
  </race>
</meeting>

I can use XPath to easily get the race id:

require 'nokogiri'                                                                                                                      

  @doc = Nokogiri::XML(File.open('data/20160521RHIL0.xml'))                                                                               

  race_id = @doc.xpath('//race/@id')                                                                                                      
  nom_id = @doc.xpath('//race/nomination/@id')  

  ...
  215411
  215412

How can I get the nodes nomination id and number of just the race_id 215411 and store it to a hash (like below)?

{215411 => [{id:198926, number:8},{id:198965, number:2}]}

Upvotes: 1

Views: 224

Answers (2)

the Tin Man
the Tin Man

Reputation: 160551

I'd do something like:

require 'nokogiri'

doc = Nokogiri::XML(<<EOT)
<meeting id="42977">
  <race id="215411">
    <nomination number="8" saddlecloth="8" horse="Chipanda" id="198926" />
    <nomination number="2" saddlecloth="2" horse="Chifries" id="198965" />
    <nomination number="1" saddlecloth="1" horse="Itpanda" id="199260" />
  </race>
  <race id="215412">
    <nomination number="1" saddlecloth="1" horse="Ruby" id="199634" />
    <nomination number="2" saddlecloth="2" horse="Gems" id="208926" />
    <nomination number="3" saddlecloth="3" horse="Rock" id="122923" />
  </race>
</meeting>
EOT

race_id = 215411
nominations = doc.at("race[id='#{race_id}']") 
   .search('nomination')
   .map{ |nomination|
     {
      number: nomination['number'].to_i,
      id: nomination['id'].to_i
     }
   }

{race_id => nominations}
# => {215411=>[{:number=>8, :id=>198926}, {:number=>2, :id=>198965}, {:number=>1, :id=>199260}]}

race[id='#{race_id}'] is building a CSS selector to find just the desired node. Then it's easy to find the desired nomination nodes.

Note, I don't use children or traverse because they'll return all nodes, including text nodes, not just element nodes. I'd have to use additional logic to ignore the text nodes, which would waste time and space.

Your question isn't clear about this, but if you wanted to return the information for all races, it's a simple tweak:

doc.search('race').map{ |race|
  nominations = race.search('nomination')
     .map{ |nomination|
       {
        number: nomination['number'].to_i,
        id: nomination['id'].to_i
       }
     }

  {race['id'].to_i => nominations}
}
# => [{215411=>[{:number=>8, :id=>198926}, {:number=>2, :id=>198965}, {:number=>1, :id=>199260}]}, {215412=>[{:number=>1, :id=>199634}, {:number=>2, :id=>208926}, {:number=>3, :id=>122923}]}]

Upvotes: 1

SoAwesomeMan
SoAwesomeMan

Reputation: 3396

require 'nokogiri'

# xml data
str =<<-EOS
<meeting id="42977">
  <race id="215411">
    <nomination number="8" saddlecloth="8" horse="Chipanda" id="198926" />
    <nomination number="2" saddlecloth="2" horse="Chifries" id="198965" />
    <nomination number="1" saddlecloth="1" horse="Itpanda" id="199260" />
  </race>
  <race id="215412">
    <nomination number="1" saddlecloth="1" horse="Ruby" id="199634" />
    <nomination number="2" saddlecloth="2" horse="Gems" id="208926" />
    <nomination number="3" saddlecloth="3" horse="Rock" id="122923" />
  </race>
</meeting>
EOS

# create doc
doc = Nokogiri::XML(str)

# clean; via http://stackoverflow.com/a/1528247
doc.xpath('//text()[not(normalize-space())]').remove

# parse doc
parsed_doc = doc.xpath('//race').inject({}) {|h,x| h[x.get_attribute('id').to_i] = x.children.map {|y| {id: y.get_attribute('id').to_i, number: y.get_attribute('number').to_i}}; h}
# {215411=>
#  [{:id=>198926, :number=>8},
#   {:id=>198965, :number=>2},
#   {:id=>199260, :number=>1}],
# 215412=>
#  [{:id=>199634, :number=>1},
#   {:id=>208926, :number=>2},
#   {:id=>122923, :number=>3}]}

# select via id
parsed_doc.select {|k,v| k == 215411}
# {215411=>
#  [{:id=>198926, :number=>8},
#   {:id=>198965, :number=>2},
#   {:id=>199260, :number=>1}]}

Here's the one-liner as a multi-liner:

parsed_doc = doc.xpath('//race').inject({}) do |h,x|
  h[x.get_attribute('id').to_i] = x.children.map do |y|
    {
      id: y.get_attribute('id').to_i,
      number: y.get_attribute('number').to_i
    }
  end
  h
end

Upvotes: 1

Related Questions