Reputation: 1413
I realize that there are many similar questions, but I am still unable to find the specific answer that I am looking for.
I am using Perl with the XML::LibXML library to read information from an XML file. The XML file has many nodes and many child nodes (and child child nodes, etc). I am trying to pull the information out of the XML file 'per node' but am really getting into the weeds trying to figure out how to do that.
Here is just an example of what I am trying to do:
#!/usr/bin/perl -w
use XML::LibXML
open ($xml_fh, "<test.xml");
my $dom = XML::LibXML->load_xml(IO => $xml_fh);;
close($xml_fh);
foreach $chapter ($dom->findnodes('/file/chapter')) {
my $chapterNumber = $chapter->findvalue('@number');
print "Chapter #$chapterNumber\n";
#I tried $dom->findnodes('/file/chapter/section') <-- spelling out the xPath with same results..
foreach $section ($dom->findnodes('//section')) {
my $sectionNumber = $section->findvalue('@number');
print " Section #$sectionNumber\n";
foreach $subsection ($dom->findnodes('//subsection')) {
my $subsectionNumber = $subsection->findvalue('@number');
print " SubSection $subsectionNumber\n";
}
}
}
This specific XML file is set up like this:
<file>
<chapter number="1">
<section number="abc123">
There is some data here I'd like to get to
<subsection number="abc123.(s)(4)">
Some additional data here
<subsection number="deeperSubSec">
There might even be deeper subsections
</subsection>
</subsection>
</section>
</chapter>
<chapter number="208">
<section number="dgfj23">
There is some data here I'd like to get to also
<subsection number="dgfj23.(s)(4)">
Some additional data here also
<subsection number="deeperSubSec44">
There might even be deeper subsections also
</subsection>
</subsection>
</section>
</chapter>
<chapter number="998">
<section number="xxxid">
There is even more data here I'd like to get to also
<subsection number="xxxid.(s)(4)">
Some additional data also here too
<subsection number="deeperSubSec999">
There might even be deeper subsections also again
</subsection>
</subsection>
</section>
</chapter>
</file>
Unfortunately, what I wind up with is just a list of repeating data. I am sure that this is because of my nested for loops, but I really an not grasping the fundamental understanding on how to operate on this data type. Hopefully someone has some resources or insight they could provide.
Here is my current output:
Chapter #1
Section #abc123
SubSection abc123.(s)(4)
SubSection deeperSubSec
SubSection dgfj23.(s)(4)
SubSection deeperSubSec44
SubSection xxxid.(s)(4)
SubSection deeperSubSec999
Section #dgfj23
SubSection abc123.(s)(4)
SubSection deeperSubSec
SubSection dgfj23.(s)(4)
SubSection deeperSubSec44
SubSection xxxid.(s)(4)
SubSection deeperSubSec999
Section #xxxid
SubSection abc123.(s)(4)
SubSection deeperSubSec
SubSection dgfj23.(s)(4)
SubSection deeperSubSec44
SubSection xxxid.(s)(4)
SubSection deeperSubSec999
Chapter #208
Section #abc123
SubSection abc123.(s)(4)
SubSection deeperSubSec
SubSection dgfj23.(s)(4)
SubSection deeperSubSec44
SubSection xxxid.(s)(4)
SubSection deeperSubSec999
Section #dgfj23
SubSection abc123.(s)(4)
SubSection deeperSubSec
SubSection dgfj23.(s)(4)
SubSection deeperSubSec44
SubSection xxxid.(s)(4)
SubSection deeperSubSec999
Section #xxxid
SubSection abc123.(s)(4)
SubSection deeperSubSec
SubSection dgfj23.(s)(4)
SubSection deeperSubSec44
SubSection xxxid.(s)(4)
SubSection deeperSubSec999
Chapter #998
Section #abc123
SubSection abc123.(s)(4)
SubSection deeperSubSec
SubSection dgfj23.(s)(4)
SubSection deeperSubSec44
SubSection xxxid.(s)(4)
SubSection deeperSubSec999
Section #dgfj23
SubSection abc123.(s)(4)
SubSection deeperSubSec
SubSection dgfj23.(s)(4)
SubSection deeperSubSec44
SubSection xxxid.(s)(4)
SubSection deeperSubSec999
Section #xxxid
SubSection abc123.(s)(4)
SubSection deeperSubSec
SubSection dgfj23.(s)(4)
SubSection deeperSubSec44
SubSection xxxid.(s)(4)
SubSection deeperSubSec999
so for each chapter, I am reading ALL sections, then I am reading ALL subsections, etc. Over and over again..
What I want to do is read, for each chapter, the associated sections, then for each of those sections, the associated subsections and any applicable sub-subsections therein..
like this:
Chapter #1
Section #abc123
Subsection #abc123.(s)(4
Sub-Subsection #deeperSubSec
Chapter #208
Section #dgfj23
Subsection #dgfj23.(s)(4)
Sub-Subsection #deeperSubSec44
etc...
Additionally, eventually, after I figure out how the basic operation works, I'll need to get access to the data contained within each chapter, section, subsection, etc. But I think I need to walk before I run, so I'll go with trying to get the simple value of the attributes first..
Thank you for your help.
Upvotes: 1
Views: 469
Reputation: 1413
So I think I figured it out. I was operating on the $dom
object the entire time which contains the entire XML tree. I believe what I needed to do was operate on the piece of the tree that I am looking at, like this:
#!/usr/bin/perl -w
use XML::LibXML
open ($xml_fh, "<test.xml");
my $dom = XML::LibXML->load_xml(IO => $xml_fh);;
close($xml_fh);
for $chapter ($dom->findnodes('/file/chapter')) {
print "Chapter #" . $chapter->findvalue('@number') ."\n";
foreach $section ($chapter->findnodes('section')) {
print " Section #" .$section->findvalue('@number') . "\n";
foreach $subsection ($section->findnodes('subsection')) {
print " Subsection #" . $subsection->findvalue('@number') . "\n";
}
}
}
which results in output more like I was hoping for:
Chapter #1
Section #abc123
Subsection #abc123.(s)(4)
Chapter #208
Section #dgfj23
Subsection #dgfj23.(s)(4)
Chapter #998
Section #xxxid
Subsection #xxxid.(s)(4)
Here is a little bit of a neater example which helps illustrate that I am now addressing the specific part of the tree obtained from the previous loop that I am currently inside:
#!/usr/bin/perl -w
use XML::LibXML
open ($xml_fh, "<test.xml");
my $dom = XML::LibXML->load_xml(IO => $xml_fh);;
close($xml_fh);
my @chapters = $dom->findnodes('/file/chapter');
for $chapter (@chapters) {
my $chapterNo = $chapter->findvalue('@number');
print "Chpater #$chapterNo\n";
@sections = $chapter->findnodes('section');
for $section (@sections) {
my $sectionNo = $section->findvalue('@number');
print " Section #$sectionNo\n";
@subsections = $section->findnodes('subsection');
for $subsection (@subsections) {
my $subsectionNo = $subsection->findvalue('@number');
print " Subsection #$subsectionNo\n";
}
}
}
Upvotes: 3