Reputation: 879
The XML Structure is as below:
<Entities>
<Entity>
<EntityName>.... </EntityName>
<EntityType>.... </EntityType>
<Tables>
<DataTables>
<DataTable>1</DataTable>
<DataTable>2</DataTable>
<DataTable>3</DataTable>
<DataTable>4</DataTable>
</DataTables>
<OtherTables>
<OtherTable>5</OtherTable>
<OtherTable>6</OtherTable>
</OtherTables>
</Tables>
</Entity>
.
.
.
</Entities>
I need to parse the file based on the Entity name selected and retrieve all the tables specifically in the order mentioned. How do I do this in Perl and which module should be used?
Upvotes: 2
Views: 2684
Reputation: 3084
My favourite module to parse XML in Perl is XML::Twig
(tutorial).
Code Sample:
use XML::Twig;
my $twig = XML::Twig->new(
twig_handlers => {
#calls the get_tables method for each Entity element
Entity => sub {get_tables($_);},
},
pretty_print => 'indented', # output will be nicely formatted
empty_tags => 'html', # outputs <empty_tag />
keep_encoding => 1,
);
$twig->parsefile(xml-file);
$twig->flush;
sub get_tables {
my $entity = shift;
#Retrieves the sub-elements of DataTables
my @data_tables = $entity->first_child("Tables")->children("DataTables");
#Do stuff with the DataTables
#Retrieves the sub-elements of OtherTables
my @other_tables = $entity->first_child("Tables")->children("OtherTables");
#Do stuff with the OtherTables
#Flushes the XML element from memory
$entity->purge;
}
Upvotes: 8
Reputation: 8591
I prefer XML::LibXML, which allows you (and me) to use XPath to select elements.
You may wish to look at a script I wrote with it.
Upvotes: 0
Reputation: 139531
Document order is defined as
There is an ordering, document order, defined on all the nodes in the document corresponding to the order in which the first character of the XML representation of each node occurs in the XML representation of the document after expansion of general entities. Thus, the root node will be the first node. Element nodes occur before their children. Thus, document order orders element nodes in order of the occurrence of their start-tag in the XML (after expansion of entities).
In other words, the order in which things occur in the XML document. The XML::XPath module produces results in document order. For example:
#! /usr/bin/perl
use warnings;
use strict;
use XML::XPath;
my $entity_template = "/Entities"
. "/Entity"
. "[EntityName='!!NAME!!']"
;
my $tables_path = join "|" =>
qw( ./Tables/DataTables/DataTable
./Tables/OtherTables/OtherTable );
my $xp = XML::XPath->new(ioref => *DATA);
foreach my $ename (qw/ foo bar /) {
print "$ename:\n";
(my $path = $entity_template) =~ s/!!NAME!!/$ename/g;
foreach my $n ($xp->findnodes($path)) {
foreach my $t ($xp->findnodes($tables_path, $n)) {
print $t->toString, "\n";
}
}
}
__DATA__
The first expression searches for <Entity>
elements where each has an <ElementName>
child whose string-value is the Entity name selected. From there, we look for <DataTable>
or <OtherTable>
.
Given input of
<Entities>
<Entity>
<EntityName>foo</EntityName>
<EntityType>type1</EntityType>
<Tables>
<DataTables>
<DataTable>1</DataTable>
<DataTable>2</DataTable>
</DataTables>
<OtherTables>
<OtherTable>3</OtherTable>
<OtherTable>4</OtherTable>
</OtherTables>
</Tables>
</Entity>
<Entity>
<EntityName>bar</EntityName>
<EntityType>type2</EntityType>
<Tables>
<DataTables>
<DataTable>5</DataTable>
<DataTable>6</DataTable>
</DataTables>
<OtherTables>
<OtherTable>7</OtherTable>
<OtherTable>8</OtherTable>
</OtherTables>
</Tables>
</Entity>
</Entities>
the output is
foo:
<DataTable>1</DataTable>
<DataTable>2</DataTable>
<OtherTable>3</OtherTable>
<OtherTable>4</OtherTable>
bar:
<DataTable>5</DataTable>
<DataTable>6</DataTable>
<OtherTable>7</OtherTable>
<OtherTable>8</OtherTable>
To extract the string-values (the “inner text”), change $tables_path
to
my $tables_path = ". / Tables / DataTables / DataTable / text() |
. / Tables / OtherTables / OtherTable / text()";
Yes, that's repetitive—because XML::XPath implements XPath 1.0.
Output:
foo: 1 2 3 4 bar: 5 6 7 8
Upvotes: 2
Reputation: 8342
See : xml-simple
before using it, keep in mind, some points like
XML::Simple is able to present a simple API because it makes some assumptions on your behalf. These include:
For event based parsing, use SAX (do not set out to write any new code for XML::Parser's handler API - it is obselete).
For tree-based parsing, you could choose between the 'Perlish' approach of XML::Twig and more standards based DOM implementations - preferably one with XPath support.
source: XML-Simple
For more detail about Perl-XML, see Perl-XML
Upvotes: -1