Reputation: 1870
I am using Nokogiri to parse an XML file that has (roughly) the following structure:
<diag>
<name>A00</name>
<desc>Cholera</desc>
<diag>
<name>A00.0</name>
<desc>Cholera due to Vibrio cholerae 01, biovar cholerae</desc>
</diag>
<diag>
...
</diag>
...
</diag>
As you can see this tree has diag
nodes that can be nested arbitrarily deep, yet each nesting is a more specific description of the parent node.
I want to "flatten" this tree so that rather than having A00.0
nested within A00
I can just have a list going something like
A00
A00.0
A00.1
...
A00.34
...
A01
...
What I have so far looks like this:
require 'nokogiri'
icd10 = File.new("icd10.xml", "r")
doc = Nokogiri::XML(icd10.read) do |config|
config.strict.noblanks
end
icd10.close
@diags = {}
@diag_count = 0
def get_diags(node)
node.children.each do |n|
if n.name == "diag"
@diags[@diag_count] = n
@diag_count += 1
get_diags(n)
end
end
end
# The xml file has sections but what I really want are the contents of the sections
doc.xpath('.//section').each do |n|
get_diags(n)
end
So far this works in that I do get all the diag
elements within the file, but the problem is that the parent nodes still contain all the content that is found in later nodes (e.g. @diags[0]
contains the A00
, A00.0
, A00.1
, etc. nodes while @diags[1]
contains just the A00.0
content).
How can I exclude nested elements from the parent element while traversing the xml content in get_diags
? Thanks in advance!
== EDIT ==
So I added this to my get_diags
method
def get_diags(node)
node.children.each do |n|
if n.name == "diag"
f = Nokogiri::XML.fragment(n.to_s)
f.search('.//diag').children.each do |d|
if d.name == "diag"
d.remove
end
end
@diags[@diag_count] = f
@diag_count += 1
get_diags(n)
end
end
end
Now @diags
holds a fragment of xml where all the nested <diag>...</diag>
are removed, which in one sense is what I want, but overall this is very very ugly, and I was wondering if anyone could share a better way to go about this. Thanks
Upvotes: 0
Views: 202
Reputation: 108049
The xpath '//diag' will give you each <diag>
node, in turn, no matter how deeply nested. Then you can just extract the text values of each node's name and desc children:
diags = doc.xpath('//diag').map do |diag|
Hash[
%w(name desc).map do |key|
[key, diag.xpath(key).text]
end
]
end
pp diags
# => [{"desc"=>"Cholera", "name"=>"A00"},
# => {"desc"=>"Cholera due to Vibrio cholerae 01, biovar cholerae",
# => "name"=>"A00.0"}]
If you wish to create a new XML tree with a different structure, I wouldn't bother trying to transform the original. Just take the extracted data and use it to create the new tree:
builder = Nokogiri::XML::Builder.new do |xml|
xml.diagnoses do
diags.each do |diag|
xml.diag {
xml.name = diag['name']
xml.desc = diag['desc']
}
end
end
end
puts builder.to_xml
# => <?xml version="1.0"?>
# => <diagnoses>
# => <diag>
# => <name=>A00</name=>
# => <desc=>Cholera</desc=>
# => </diag>
# => <diag>
# => <name=>A00.0</name=>
# => <desc=>Cholera due to Vibrio cholerae 01, biovar cholerae</desc=>
# => </diag>
# => </diagnoses>
Upvotes: 2