Reputation: 267
I am trying to parse some XML into an array. Here is a chunk of the XML I am parsing:
<Group_add>
<Group org_pac_id="0000000001">
<org_legal_name>NAME OF GROUP</org_legal_name>
<par_status>Y</par_status>
<Quality>
<GPRO_status>N</GPRO_status>
<ERX_status>N</ERX_status>
</Quality>
<Profile_Spec_list>
<Spec>08</Spec>
</Profile_Spec_list>
<Location adrs_id="OR974772594SP2280XRDXX300">
<other_tags>xx</other_tags>
</Location>
</Group>
<Group org_pac_id="0000000002">
...
</Group>
</Group_add>
I am currently able to get the attribute of "Group" and the text within "org_legal_name" and have them added to an array with the code below.
def parse(input_file, output_array)
puts "Parsing #{input_file} data. Please wait..."
doc = Nokogiri::XML(File.read(input_file))
doc.xpath("//Group").each do |group|
["org_legal_name"].each do |name|
output_array << [group["org_pac_id"], group.at(name).inner_html]
end
end
end
I would like to add the location "adrs_id" to the output_array as well, but can't seem to figure that part out.
Example output:
["0000000001", "NAME OF GROUP", "OR974772594SP2280XRDXX300"]
["0000000002", "NAME OF GROUP 2", "OR974772594SP2280XRDXX301"]
Upvotes: 2
Views: 828
Reputation: 160551
Starting with:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<xml>
<Group org_pac_id="0000000001">
<org_legal_name>NAME OF GROUP</org_legal_name>
<Location adrs_id="OR974772594SP2280XRDXX300">
<other_tags>xx</other_tags>
</Location>
</Group>
</xml>
EOT
Based on your XML I'd use:
array = []
array << doc.at('org_legal_name').text
array << doc.at('Location')['adrs_id']
array # => ["NAME OF GROUP", "OR974772594SP2280XRDXX300"]
If the XML is more complex, which I suspect it is, then we need an accurate, minimal, example of it.
Based on the updated XML, (which is still suspicious), here's what I'd use. Notice that I stripped out information that isn't germane to the question to reduce the XML to the minimal needed:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<xml>
<Group_add>
<Group org_pac_id="0000000001">
<org_legal_name>NAME OF GROUP</org_legal_name>
<Location adrs_id="OR974772594SP2280XRDXX300">
<other_tags>xx</other_tags>
</Location>
</Group>
<Group org_pac_id="0000000002">
<org_legal_name>NAME OF ANOTHER GROUP</org_legal_name>
<Location adrs_id="OR974772594SP2280XRDXX301">
<other_tags>xx</other_tags>
</Location>
</Group>
</Group_add>
</xml>
EOT
data = doc.search('Group').map do |group|
[
group['org_pac_id'],
group.at('org_legal_name').text,
group.at('Location')['adrs_id']
]
end
Which results in:
data # => [["0000000001", "NAME OF GROUP", "OR974772594SP2280XRDXX300"], ["0000000002", "NAME OF ANOTHER GROUP", "OR974772594SP2280XRDXX301"]]
Think of the group
variable being passed into the block as a placeholder. From that node it's easy to look downward into the DOM, and grab things that apply to only that particular node.
Note that I'm using CSS instead of XPath selectors. They're easier to read and usually work fine. Sometimes we need the added functionality of XPath, and sometimes Nokogiri's use of jQuery's CSS accessors give us things that are useful.
Upvotes: 3